Prediction of Age, Sentiment, and Connectivity from Social Media Text

  • Thin Nguyen
  • Dinh Phung
  • Brett Adams
  • Svetha Venkatesh
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6997)


Social media corpora, including the textual output of blogs, forums, and messaging applications, provide fertile ground for linguistic analysis material diverse in topic and style, and at Web scale. We investigate manifest properties of textual messages, including latent topics, psycholinguistic features, and author mood, of a large corpus of blog posts, to analyze the impact of age, emotion, and social connectivity. These properties are found to be significantly different across the examined cohorts, which suggest discriminative features for a number of useful classification tasks. We build binary classifiers for old versus young bloggers, social versus solo bloggers, and happy versus sad posts with high performance. Analysis of discriminative features shows that age turns upon choice of topic, whereas sentiment orientation is evidenced by linguistic style. Good prediction is achieved for social connectivity using topic and linguistic features, leaving tagged mood a modest role in all classifications.


Latent Dirichlet Allocation Discriminative Feature Social Connectivity Latent Topic Current Mood 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)zbMATHGoogle Scholar
  2. 2.
    Dewaele, J.M., Furnham, A.: Personality and speech production: a pilot study of second language learners. Personality and Individual Differences 28(2), 355–365 (2000)CrossRefGoogle Scholar
  3. 3.
    Dunbar, R.I.M.: Coevolution of neocortical size, group size and language in humans. Behavioral and Brain Sciences 16(4), 681–735 (1993)CrossRefGoogle Scholar
  4. 4.
    Freyd, M.: Introverts and extroverts. Psychological Review 31(1), 74–87 (1924)CrossRefGoogle Scholar
  5. 5.
    Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proceedings of the National Academy of Sciences 101(90001), 5228–5235 (2004)CrossRefGoogle Scholar
  6. 6.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: An update. ACM SIGKDD Explorations Newsletter 11(1), 10–18 (2009)CrossRefGoogle Scholar
  7. 7.
    Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Machine Learning 42(1), 177–196 (2001)CrossRefzbMATHGoogle Scholar
  8. 8.
    Mairesse, F., Walker, M.A., Mehl, M.R., Moore, R.K.: Using linguistic cues for the automatic recognition of personality in conversation and text. Journal of Artificial Intelligence Research 30(1), 457–500 (2007)zbMATHGoogle Scholar
  9. 9.
    Mihalcea, R., Liu, H.: A corpus-based approach to finding happiness. In: Proceedings of the AAAI Spring Symposium on Computational Approaches to Weblogs (2006)Google Scholar
  10. 10.
    Newman, M.L., Groom, C.J., Handelman, L.D., Pennebaker, J.W.: Gender differences in language use: An analysis of 14,000 text samples. Discourse Processes 45, 211–236 (2008)CrossRefGoogle Scholar
  11. 11.
    Nguyen, T., Phung, D., Adams, B., Venkatesh, S.: Towards discovery of influence and personality traits through social link prediction. In: Procs. of the Int. AAAI Conference on Weblogs and Social Media, ICWSM (2011)Google Scholar
  12. 12.
    Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up?: sentiment classification using machine learning techniques. In: Proceedings of the ACL 2002 Conference on Empirical Methods in Natural Language Processing, vol. 10, pp. 79–86. Association for Computational Linguistics (2002)Google Scholar
  13. 13.
    Pennebaker, J.W., Chung, C.K., Ireland, M., Gonzales, A., Booth, R.J.: The development and psychometric properties of LIWC 2007. LIWC Inc., Austin (2007)Google Scholar
  14. 14.
    Pennebaker, J.W., Francis, M.E., Booth, R.J.: Linguistic inquiry and word count (LIWC) [computer software]. LIWC Inc., Austin (2007)Google Scholar
  15. 15.
    Pennebaker, J.W., Stone, L.D.: Words of wisdom: Language use over the life span. Journal of Personality and Social Psychology 85(2), 291–301 (2003)CrossRefGoogle Scholar
  16. 16.
    Quintelier, E.: Differences in political participation between young and old people. Contemporary Politics 13(2), 165 (2007)CrossRefGoogle Scholar
  17. 17.
    Rude, S., Gortner, E.M., Pennebaker, J.: Language use of depressed and depression-vulnerable college students. Cognition & Emotion 18(8), 1121–1133 (2004)CrossRefGoogle Scholar
  18. 18.
    Schler, J., Koppel, M., Argamon, S., Pennebaker, J.: Effects of age and gender on blogging. In: 2006 AAAI Spring Symposium on Computational Approaches for Analyzing Weblogs (2006)Google Scholar
  19. 19.
    Slatcher, R.B., Chung, C.K., Pennebaker, J.W., Stone, L.D.: Winning words: Individual differences in linguistic style among us presidential and vice presidential candidates. Journal of Research in Personality 41(1), 63–75 (2007)CrossRefGoogle Scholar
  20. 20.
    Stirman, S.W., Pennebaker, J.W.: Word use in the poetry of suicidal and nonsuicidal poets. Psychosomatic Medicine 63(4), 517 (2001)CrossRefGoogle Scholar
  21. 21.
    Tausczik, Y.R., Pennebaker, J.W.: The psychological meaning of words: LIWC and computerized text analysis methods. Journal of Language and Social Psychology 29(1), 24 (2010)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Thin Nguyen
    • 1
  • Dinh Phung
    • 1
  • Brett Adams
    • 1
  • Svetha Venkatesh
    • 1
  1. 1.Curtin UniversityAustralia

Personalised recommendations