Inferring Human Traits from Facebook Statuses

  • Andrew CutlerEmail author
  • Brian Kulis
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11185)


This paper explores the use of language models to predict 20 human traits from users’ Facebook status updates. The data was collected by the myPersonality project, and includes user statuses along with their personality, gender, political identification, religion, race, satisfaction with life, IQ, self-disclosure, fair-mindedness, and belief in astrology. A single interpretable model meets state of the art results for well-studied tasks such as predicting gender and personality; and sets the standard on other traits such as IQ, sensational interests, political identity, and satisfaction with life. Additionally, highly weighted words are published for each trait. These lists are valuable for creating hypotheses about human behavior, as well as for understanding what information a model is extracting. Using performance and extracted features we analyze models built on social media. The real world problems we explore include gendered classification bias and Cambridge Analytica’s use of psychographic models.


Social media Psychographic prediction NLP 


  1. 1.
    Stewart, J.B.: Facebook has 50 minutes of your time each day. It wants more. The New York Times, vol. 5 (2016)Google Scholar
  2. 2.
    SunCorp, Digitising reputation pays off in the rental market (2017)Google Scholar
  3. 3.
    Khandani, A.E., Kim, A.J., Lo, A.W.: Consumer credit-risk models via machine-learning algorithms. J. Bank. Financ. 34(11), 2767–2787 (2010)CrossRefGoogle Scholar
  4. 4.
    Cogburn, D.L., Espinoza-Vasquez, F.K.: From networked nominee to networked nation: examining the impact of web 2.0 and social media on political participation and civic engagement in the 2008 Obama campaign. J. Polit. Mark. 10(1–2), 189–213 (2011)CrossRefGoogle Scholar
  5. 5.
    González, R.J.: Hacking the citizenry? Personality profiling, big data and the election of Donald Trump. Anthropol. Today 33(3), 9–12 (2017)CrossRefGoogle Scholar
  6. 6.
    Fitzpatrick, K.K., Darcy, A., Vierhile, M.: Delivering cognitive behavior therapy to young adults with symptoms of depression and anxiety using a fully automated conversational agent (woebot): a randomized controlled trial. JMIR Mental Health 4(2), e19 (2017). PMID: 28588005, PMCID: 5478797CrossRefGoogle Scholar
  7. 7.
    Allan, R.: Hard questions: who should decide what is hate speech in an online global community? (2017)Google Scholar
  8. 8.
    Cheng, J., Danescu-Niculescu-Mizil, C., Leskovec, J.: Antisocial behavior in online discussion communities. In: ICWSM, pp. 61–70 (2015)Google Scholar
  9. 9.
    Noulas, A., Scellato, S., Lambiotte, R., Pontil, M., Mascolo, C.: A tale of many cities: universal patterns in human urban mobility. PloS one 7(5), e37027 (2012)CrossRefGoogle Scholar
  10. 10.
    Yang, S.-H., Long, B., Smola, A., Sadagopan, N., Zheng, Z., Zha, H.: Like like alike: joint friendship and interest propagation in social networks. In: Proceedings of the 20th International Conference on World Wide Web, pp. 537–546. ACM (2011)Google Scholar
  11. 11.
    Kosinski, M., Matz, S.C., Gosling, S.D., Popov, V., Stillwell, D.: Facebook as a research tool for the social sciences: opportunities, challenges, ethical considerations, and practical guidelines. Am. Psychol. 70(6), 543 (2015)CrossRefGoogle Scholar
  12. 12.
    Henrich, J., Heine, S.J., Norenzayan, A.: The weirdest people in the world? Behav. Brain Sci. 33(2–3), 61–83 (2010)CrossRefGoogle Scholar
  13. 13.
    Egan, V., Auty, J., Miller, R., Ahmadi, S., Richardson, C., Gargan, I.: Sensational interests and general personality traits. J. Forensic Psychiatry 10(3), 567–582 (1999)CrossRefGoogle Scholar
  14. 14.
    Egan, V., Campbell, V.: Sensational interests, sustaining fantasies and personality predict physical aggression. Pers. Individ. Differ. 47(5), 464–469 (2009)CrossRefGoogle Scholar
  15. 15.
    Weiss, A., Egan, V., Figueredo, A.J.: Sensational interests as a form of intrasexual competition. Pers. Individ. Differ. 36(3), 563–573 (2004)CrossRefGoogle Scholar
  16. 16.
    Hagger-Johnson, G., Egan, V., Stillwell, D.: Are social networking profiles reliable indicators of sensational interests? J. Res. Pers. 45(1), 71–76 (2011)CrossRefGoogle Scholar
  17. 17.
    Wang, N., Kosinski, M., Stillwell, D., Rust, J.: Can well-being be measured using facebook status updates? Validation of facebook’s gross national happiness index. Soc. Indic. Res. 115(1), 483–491 (2014)CrossRefGoogle Scholar
  18. 18.
    Kosinski, M., Stillwell, D., Graepel, T.: Private traits and attributes are predictable from digital records of human behavior. Proc. Natl. Acad. Sci. 110(15), 5802–5805 (2013)CrossRefGoogle Scholar
  19. 19.
    Schwartz, H.A., et al.: Personality, gender, and age in the language of social media: the open-vocabulary approach. PloS One 8(9), e73791 (2013)CrossRefGoogle Scholar
  20. 20.
    Farnadi, G., et al.: Computational personality recognition in social media. User Model. User Adapt. Interact. 26(2–3), 109–142 (2016)CrossRefGoogle Scholar
  21. 21.
    Sap, M., et al.: Developing age and gender predictive lexica over social media. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1146–1151 (2014)Google Scholar
  22. 22.
    The New York Times, How trump consultants exploited the data of millions (2018)Google Scholar
  23. 23.
    Watch, M.: Facebook valuation drops \$75 billion in week after cambridge analytica scandal (2018)Google Scholar
  24. 24.
    The Guardian, I made Steve Bannons psychological warfare tool: meet the data war whistleblower (2018)Google Scholar
  25. 25.
    Pennebaker, J.W., Francis, M.E., Booth, R.J.: Linguistic inquiry and word count. In: LIWC 2001, vol. 71, no. 2001, p. 2001. Lawrence Erlbaum Associates, Mahway (2001)Google Scholar
  26. 26.
    Youyou, W., Kosinski, M., Stillwell, D.: Computer-based personality judgments are more accurate than those made by humans. Proc. Natl. Acad. Sci. 112(4), 1036–1040 (2015)CrossRefGoogle Scholar
  27. 27.
    Conneau, A., Schwenk, H., Barrault, L., Lecun, Y.: Very deep convolutional networks for text classification. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, vol. 1, pp. 1107–1116 (2017)Google Scholar
  28. 28.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
  29. 29.
    Nguyen, D., et al.: Why gender and age prediction from tweets is hard: lessons from a crowdsourcing experiment. In: Proceedings of COLING 2014, The 25th International Conference on Computational Linguistics: Technical Papers, pp. 1950–1961 (2014)Google Scholar
  30. 30.
    Bivens, R.: The gender binary will not be deprogrammed: ten years of coding gender on facebook. New Media Soc. 19(6), 880–898 (2017)CrossRefGoogle Scholar
  31. 31.
    Digman, J.M.: Personality structure: emergence of the five-factor model. Ann. Rev. Psychol. 41(1), 417–440 (1990)CrossRefGoogle Scholar
  32. 32.
    McCrae, R.R., Costa, P.T.: Validation of the five-factor model of personality across instruments and observers. J. Personality Soc. Psychol. 52(1), 81 (1987)CrossRefGoogle Scholar
  33. 33.
    M. LLC, The development and piloting of an online IQ test (2014)Google Scholar
  34. 34.
    Kosinski, M.: Measurement and prediction of individual and group differences in the digital environment. Department of Psychology, University of Cambridge (2014)Google Scholar
  35. 35.
    Flynn, J.R.: Massive IQ gains in 14 nations: what IQ tests really measure. Psychol. Bull. 101(2), 171 (1987)CrossRefGoogle Scholar
  36. 36.
    Diener, E., Emmons, R.A., Larsen, R.J., Griffin, S.: The satisfaction with life scale. J. Pers. Assess. 49(1), 71–75 (1985)CrossRefGoogle Scholar
  37. 37.
    Cooke, L., Wardle, J., Gibson, E., Sapochnik, M., Sheiham, A., Lawson, M.: Demographic, familial and trait predictors of fruit and vegetable consumption by pre-school children. Public Health Nutr. 7(2), 295–302 (2004)CrossRefGoogle Scholar
  38. 38.
    Peciña, M., et al.: Personality trait predictors of placebo analgesia and neurobiological correlates. Neuropsychopharmacology 38(4), 639 (2013)CrossRefGoogle Scholar
  39. 39.
    Quilty, L.C., Sellbom, M., Tackett, J.L., Bagby, R.M.: Personality trait predictors of bipolar disorder symptoms. Psychiatry Res. 169(2), 159–163 (2009)CrossRefGoogle Scholar
  40. 40.
    Tett, R.P., Jackson, D.N., Rothstein, M.: Personality measures as predictors of job performance: a meta-analytic review. Pers. Psychol. 44(4), 703–742 (1991)CrossRefGoogle Scholar
  41. 41.
    Park, G., et al.: Automatic personality assessment through social media language. J. Pers. Soc. Psychol. 108(6), 934 (2015)CrossRefGoogle Scholar
  42. 42.
    Cesare, N., Grant, C., Nsoesie, E.O.: Detection of user demographics on social media: a review of methods and recommendations for best practices. arXiv preprint arXiv:1702.01807 (2017)
  43. 43.
    Kleinberg, J., Mullainathan, S., Raghavan, M.: Inherent trade-offs in the fair determination of risk scores. arXiv preprint arXiv:1609.05807 (2016)
  44. 44.
    John, O.P., Srivastava, S.: The big five trait taxonomy: history, measurement, and theoretical perspectives. In: Handbook of Personality: Theory and Research, vol. 2, pp. 102–138 (1999)Google Scholar
  45. 45.
    Kleinberg, J.M.: An impossibility theorem for clustering. In: Advances in Neural Information Processing Systems, pp. 463–470 (2003)Google Scholar
  46. 46.
    Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. (CSUR) 31(3), 264–323 (1999)CrossRefGoogle Scholar
  47. 47.
    Shamir, R., Sharan, R.: 1 1 algorithmic approaches to clustering gene expression data. In: Current Topics in Computational Molecular Biology, p. 269 (2002)Google Scholar
  48. 48.
    Dixon, S., Pampalk, E., Widmer, G.: Classification of dance music by periodicity patterns (2003)Google Scholar
  49. 49.
    Meinshausen, N., Yu, B.: Lasso-type recovery of sparse representations for high-dimensional data. Ann. Stat. 46, 246–270 (2009)MathSciNetCrossRefGoogle Scholar
  50. 50.
    Lau, R.R., Sigelman, L., Rovner, I.B.: The effects of negative political campaigns: a meta-analytic reassessment. J. Polit. 69(4), 1176–1209 (2007)CrossRefGoogle Scholar
  51. 51.
    Huddy, L.: Group identity and political cohesion. In: Emerging Trends in the Social and Behavioral Sciences: An Interdisciplinary, Searchable, and Linkable Resource (2003)Google Scholar
  52. 52.
    Branscombe, N.R., Wann, D.L.: Collective self-esteem consequences of outgroup derogation when a valued social identity is on trial. Eur. J. Soc. Psychol. 24(6), 641–657 (1994)CrossRefGoogle Scholar
  53. 53.
    Schneider, M.C., Bos, A.L.: Measuring stereotypes of female politicians. Polit. Psychol. 35(2), 245–266 (2014)CrossRefGoogle Scholar
  54. 54.
    Dolan, K.: The impact of gender stereotyped evaluations on support for women candidates. Polit. Behav. 32(1), 69–88 (2010)CrossRefGoogle Scholar
  55. 55.
    Vehtari, A., Gelman, A., Gabry, J.: Efficient implementation of leave-one-out cross-validation and WAIC for evaluating fitted bayesian models. arXiv preprint arXiv:1507.04544 (2015)
  56. 56.
    Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)MathSciNetzbMATHGoogle Scholar
  57. 57.
    Preoţiuc-Pietro, D., Liu, Y., Hopkins, D., Ungar, L.: Beyond binary labels: political ideology prediction of twitter users. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 729–740 (2017)Google Scholar
  58. 58.
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)Google Scholar
  59. 59.
    Sniekers, S., et al.: Genome-wide association meta-analysis of 78,308 individuals identifies new loci and genes influencing human intelligence. Nature Genet. 49(7), 1107 (2017)CrossRefGoogle Scholar
  60. 60.
    Gottlieb, B.W., Gottlieb, J., Berkell, D., Levy, L.: Sociometric status and solitary play of LD boys and girls. J. Learn. Disabil. 19(10), 619–622 (1986)CrossRefGoogle Scholar
  61. 61.
    Bryan, T., Wheeler, R., Felcan, J., Henek, T.: come on, dummy an observational study of children’s communications. J. Learn. Disabil. 9(10), 661–669 (1976)CrossRefGoogle Scholar
  62. 62.
    McConaughy, S.H., Ritter, D.R.: Social competence and behavioral problems of learning disabled boys aged 6–11. J. Learn. Disabil. 19(1), 39–45 (1986)CrossRefGoogle Scholar
  63. 63.
    Bellanti, C.J., Bierman, K.L.: Disentangling the impact of low cognitive ability and inattention on social behavior and peer relationships. J. Clin. Child Psychol. 29(1), 66–75 (2000)CrossRefGoogle Scholar
  64. 64.
    Suykens, J.A., Vandewalle, J.: Least squares support vector machine classifiers. Neural Process. Lett. 9(3), 293–300 (1999)CrossRefGoogle Scholar
  65. 65.
    Golub, G.H., Reinsch, C.: Singular value decomposition and least squares solutions. Numerische mathematik 14(5), 403–420 (1970)MathSciNetCrossRefGoogle Scholar
  66. 66.
    Iyyer, M., Enns, P., Boyd-Graber, J., Resnik, P.: Political ideology detection using recursive neural networks. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 1113–1122 (2014)Google Scholar
  67. 67.
    Felbo, B., Mislove, A., Søgaard, A., Rahwan, I., Lehmann, S.: Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm. arXiv preprint arXiv:1708.00524 (2017)
  68. 68.
    Wired, The decline and fall of an ultra rich online gaming empire (2008)Google Scholar
  69. 69.
    CBS News: Trump campaign phased out use of Cambridge analytica data before election (2018)Google Scholar
  70. 70.
    Pew, Religious landscape study (2014)Google Scholar
  71. 71.
    Zhao, J., Wang, T., Yatskar, M., Ordonez, V., Chang, K.-W.: Men also like shopping: reducing gender bias amplification using corpus-level constraints. arXiv preprint arXiv:1707.09457 (2017)
  72. 72.
    Zou, W.Y., Socher, R., Cer, D., Manning, C.D.: Bilingual word embeddings for phrase-based machine translation. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1393–1398 (2013)Google Scholar
  73. 73.
    Clinchant, S., Perronnin, F.: Aggregating continuous word embeddings for information retrieval. In: Proceedings of the Workshop on Continuous Vector Space Models and Their Compositionality, pp. 100–109 (2013)Google Scholar
  74. 74.
    Luo, J., Sorour, S.E., Goda, K., Mine, T.: Predicting student grade based on free-style comments using word2vec and ann by considering prediction results obtained in consecutive lessons. International Educational Data Mining Society (2015)Google Scholar
  75. 75.
    Bolukbasi, T., Chang, K.-W., Zou, J.Y., Saligrama, V., Kalai, A.T.: Man is to computer programmer as woman is to homemaker? Debiasing word embeddings. In: Advances in Neural Information Processing Systems, pp. 4349–4357 (2016)Google Scholar
  76. 76.
    Dwork, C., Hardt, M., Pitassi, T., Reingold, O., Zemel, R.: Fairness through awareness. In: Proceedings of the 3rd Innovations in Theoretical Computer Science Conference, pp. 214–226. ACM (2012)Google Scholar
  77. 77.
    Joseph, M., Kearns, M., Morgenstern, J., Neel, S., Roth, A.: Rawlsian fairness for machine learning. arXiv preprint arXiv:1610.09559 (2016)
  78. 78.
    Kusner, M.J., Loftus, J., Russell, C., Silva, R.: Counterfactual fairness. In: Advances in Neural Information Processing Systems, pp. 4069–4079 (2017)Google Scholar
  79. 79.
    Zafar, M.B., Valera, I., Gomez Rodriguez, M., Gummadi, K.P.: Fairness beyond disparate treatment & disparate impact: learning classification without disparate mistreatment. In: Proceedings of the 26th International Conference on World Wide Web, pp. 1171–1180. International World Wide Web Conferences Steering Committee (2017)Google Scholar
  80. 80.
    Hardt, M., Price, E., Srebro, N., et al.: Equality of opportunity in supervised learning. In: Advances in Neural Information Processing Systems, pp. 3315–3323 (2016)Google Scholar
  81. 81.
    Grgic-Hlaca, N., Zafar, M.B., Gummadi, K.P., Weller, A.: The case for process fairness in learning: Feature selection for fair decision making. In: NIPS Symposium on Machine Learning and the Law, vol. 1, p. 2 (2016)Google Scholar
  82. 82.
    Saroglou, V.: Religiousness as a cultural adaptation of basic traits: a five-factor model perspective. Personality Soc. Psychol. Rev. 14(1), 108–125 (2010)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Boston UniversityBostonUSA

Personalised recommendations