Understanding Email Writers: Personality Prediction from Email Messages

  • Jianqiang Shen
  • Oliver Brdiczka
  • Juan Liu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7899)


Email is a ubiquitous communication tool and constitutes a significant portion of social interactions. In this paper, we attempt to infer the personality of users based on the content of their emails. Such inference can enable valuable applications such as better personalization, recommendation, and targeted advertising. Considering the private and sensitive nature of email content, we propose a privacy-preserving approach for collecting email and personality data. We then frame personality prediction based on the well-known Big Five personality model and train predictors based on extracted email features. We report prediction performance of 3 generative models with different assumptions. Our results show that personality prediction is feasible, and our email feature set can predict personality with reasonable accuracies.


Personality behavior analysis email text processing 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Argamon, S., Whitelaw, C., Chase, P., Hota, S.R., Garg, N., Levitan, S.: Stylistic text classification using functional lexical features. Journal of the American Society for Information Science and Technology 58(6), 802–822 (2007)CrossRefGoogle Scholar
  2. 2.
    Bellotti, V., Ducheneaut, N., Howard, M., Smith, I.: Taking email to task: the design and evaluation of a task management centered email tool. In: CHI 2003, pp. 345–352 (2003)Google Scholar
  3. 3.
    Blei, D., Ng, A., Jordan, M.: Latent dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)MATHGoogle Scholar
  4. 4.
    Carvalho, V.R., Cohen, W.W.: Learning to extract signature and reply lines from email. In: Proc. of CEAS 2004 (2004)Google Scholar
  5. 5.
    Cohen, W.W., Carvalho, V.R., Mitchell, T.M.: Learning to classify email into “speech acts”. In: Proc. of EMNLP 2004, pp. 309–316 (2004)Google Scholar
  6. 6.
    Dredze, M., Brooks, T., Carroll, J., Magarick, J., Blitzer, J.: FernandoPereira: Intelligent email: reply and attachment prediction. In: Proc. of the 13th IUI, pp. 321–324 (2008)Google Scholar
  7. 7.
    Ducheneaut, N., Bellotti, V.: E-mail as habitat: an exploration of embedded personal information management. Interactions 8, 30–38 (2001)CrossRefGoogle Scholar
  8. 8.
    Ehrenberg, A.L., Juckes, S.C., White, K.M., Walsh, S.P.: Personality and self-esteem as predictors of young people’s technology use. Cyberpsychology & Behavior 11(6), 739–741 (2008)CrossRefGoogle Scholar
  9. 9.
    Hamburger, Y., Ben-Artzi, E.: The relationship between extraversion and neuroticism and the different uses of the internet. Computers in Human Behavior 6(4) (July 2000)Google Scholar
  10. 10.
    Jakobwitz, S., Egan, V.: The dark ‘triad’ of psychopathy and normal personality traits. Personality and Individual Differences 40(0), 331–339 (2006)CrossRefGoogle Scholar
  11. 11.
    Joachims, T.: Learning to Classify Text Using Support Vector Machines. Kluwer Academic Publishers (2001)Google Scholar
  12. 12.
    John, O.P., Robins, R.W., Pervin, L.A.: Handbook of Personality: Theory and Research. 3rd edn. The Guilford Press (2010)Google Scholar
  13. 13.
    Kenny, D.A., Horner, C., Kashy, D.A., Chu, L.C.: Consensus at zero acquaintance: Replication, behavioral cues, and stability. Journal of Personality and Social Psychology, 88–97 (1992)Google Scholar
  14. 14.
    Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proc. of ICML 2001, pp. 282–289 (2001)Google Scholar
  15. 15.
    Lam, D., Rohall, S.L., Schmandt, C., Stern, M.K.: Exploiting e-mail structure to improve summarization. In: Proc. of CSCW 2002 (2002)Google Scholar
  16. 16.
    Lepri, B., Mana, N., Cappelletti, A., Pianesi, F., Zancanaro, M.: Modeling the personality of participants during group interactions. In: Houben, G.-J., McCalla, G., Pianesi, F., Zancanaro, M. (eds.) UMAP 2009. LNCS, vol. 5535, pp. 114–125. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  17. 17.
    Muldner, K., Burleson, W., VanLehn, K.: “Yes!”: Using tutor and sensor data to predict moments of delight during instructional activities. In: De Bra, P., Kobsa, A., Chin, D. (eds.) UMAP 2010. LNCS, vol. 6075, pp. 159–170. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  18. 18.
    Pang, B., Lee, L.: Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: Proc. of the 43rd ACL, pp. 115–124 (2005)Google Scholar
  19. 19.
    Pennebaker, J.W., Francis, M.E., Booth, R.J.: Linguistic Inquiry and Word Count (LIWC2001). Lawrence Erlbaum Associates, Mahwah (2001)Google Scholar
  20. 20.
    Ramage, D., Hall, D., Nallapati, R., Manning, C.: Labeled lda: a supervised topic model for credit attribution in multi-labeled corpora. In: Proc. of EMNLP 2009, pp. 248–256 (2009)Google Scholar
  21. 21.
    Shaw, E., Stroz, E.: Warmtouch: assessing the insider threat and relationship management. In: Parker, T., Devost, M., Sachs, M., Shaw, E., Stroz, E. (eds.) Cyber Adversary Characterization: Auditing the Hacker Mind, Syngress Publishing (2004)Google Scholar
  22. 22.
    Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proc. of NAACL 2003, 173–180 (2003)Google Scholar
  23. 23.
    Tsoumakas, G., Katakis, I.: Multi label classification: An overview. International Journal of Data Warehousing and Mining 3(3), 1–13 (2005)CrossRefGoogle Scholar
  24. 24.
    Tsoumakas, G., Katakis, I., Vlahavas, I.: Random k-labelsets for multilabel classification. IEEE Transactions on Knowledge and Data Engineering 23(7), 1079–1089 (2011)CrossRefGoogle Scholar
  25. 25.
    Whittaker, S., Bellotti, V., Gwizdka, J.: Email in personal information management. Communications of the ACM 49(1), 68–73 (2006)CrossRefGoogle Scholar
  26. 26.
    Wiktionary: a multilingual, web-based free dictionary (2013), http://www.wiktionary.org (retrieved)
  27. 27.
    Wilson, T., Wiebe, J., Hoffmann, P.: Recognizing contextual polarity in phrase-level sentiment analysis. In: Proceedings of HLT-EMNLP, pp. 347–354 (2005)Google Scholar
  28. 28.
    Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proc. of ICML 1997, 412–420 (1997)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Jianqiang Shen
    • 1
  • Oliver Brdiczka
    • 1
  • Juan Liu
    • 1
  1. 1.Palo Alto Research CenterPalo AltoUSA

Personalised recommendations