Abstract
Social media corpora, including the textual output of blogs, forums, and messaging applications, provide fertile ground for linguistic analysis material diverse in topic and style, and at Web scale. We investigate manifest properties of textual messages, including latent topics, psycholinguistic features, and author mood, of a large corpus of blog posts, to analyze the impact of age, emotion, and social connectivity. These properties are found to be significantly different across the examined cohorts, which suggest discriminative features for a number of useful classification tasks. We build binary classifiers for old versus young bloggers, social versus solo bloggers, and happy versus sad posts with high performance. Analysis of discriminative features shows that age turns upon choice of topic, whereas sentiment orientation is evidenced by linguistic style. Good prediction is achieved for social connectivity using topic and linguistic features, leaving tagged mood a modest role in all classifications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)
Dewaele, J.M., Furnham, A.: Personality and speech production: a pilot study of second language learners. Personality and Individual Differences 28(2), 355–365 (2000)
Dunbar, R.I.M.: Coevolution of neocortical size, group size and language in humans. Behavioral and Brain Sciences 16(4), 681–735 (1993)
Freyd, M.: Introverts and extroverts. Psychological Review 31(1), 74–87 (1924)
Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proceedings of the National Academy of Sciences 101(90001), 5228–5235 (2004)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: An update. ACM SIGKDD Explorations Newsletter 11(1), 10–18 (2009)
Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Machine Learning 42(1), 177–196 (2001)
Mairesse, F., Walker, M.A., Mehl, M.R., Moore, R.K.: Using linguistic cues for the automatic recognition of personality in conversation and text. Journal of Artificial Intelligence Research 30(1), 457–500 (2007)
Mihalcea, R., Liu, H.: A corpus-based approach to finding happiness. In: Proceedings of the AAAI Spring Symposium on Computational Approaches to Weblogs (2006)
Newman, M.L., Groom, C.J., Handelman, L.D., Pennebaker, J.W.: Gender differences in language use: An analysis of 14,000 text samples. Discourse Processes 45, 211–236 (2008)
Nguyen, T., Phung, D., Adams, B., Venkatesh, S.: Towards discovery of influence and personality traits through social link prediction. In: Procs. of the Int. AAAI Conference on Weblogs and Social Media, ICWSM (2011)
Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up?: sentiment classification using machine learning techniques. In: Proceedings of the ACL 2002 Conference on Empirical Methods in Natural Language Processing, vol. 10, pp. 79–86. Association for Computational Linguistics (2002)
Pennebaker, J.W., Chung, C.K., Ireland, M., Gonzales, A., Booth, R.J.: The development and psychometric properties of LIWC 2007. LIWC Inc., Austin (2007)
Pennebaker, J.W., Francis, M.E., Booth, R.J.: Linguistic inquiry and word count (LIWC) [computer software]. LIWC Inc., Austin (2007)
Pennebaker, J.W., Stone, L.D.: Words of wisdom: Language use over the life span. Journal of Personality and Social Psychology 85(2), 291–301 (2003)
Quintelier, E.: Differences in political participation between young and old people. Contemporary Politics 13(2), 165 (2007)
Rude, S., Gortner, E.M., Pennebaker, J.: Language use of depressed and depression-vulnerable college students. Cognition & Emotion 18(8), 1121–1133 (2004)
Schler, J., Koppel, M., Argamon, S., Pennebaker, J.: Effects of age and gender on blogging. In: 2006 AAAI Spring Symposium on Computational Approaches for Analyzing Weblogs (2006)
Slatcher, R.B., Chung, C.K., Pennebaker, J.W., Stone, L.D.: Winning words: Individual differences in linguistic style among us presidential and vice presidential candidates. Journal of Research in Personality 41(1), 63–75 (2007)
Stirman, S.W., Pennebaker, J.W.: Word use in the poetry of suicidal and nonsuicidal poets. Psychosomatic Medicine 63(4), 517 (2001)
Tausczik, Y.R., Pennebaker, J.W.: The psychological meaning of words: LIWC and computerized text analysis methods. Journal of Language and Social Psychology 29(1), 24 (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Nguyen, T., Phung, D., Adams, B., Venkatesh, S. (2011). Prediction of Age, Sentiment, and Connectivity from Social Media Text. In: Bouguettaya, A., Hauswirth, M., Liu, L. (eds) Web Information System Engineering – WISE 2011. WISE 2011. Lecture Notes in Computer Science, vol 6997. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24434-6_17
Download citation
DOI: https://doi.org/10.1007/978-3-642-24434-6_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24433-9
Online ISBN: 978-3-642-24434-6
eBook Packages: Computer ScienceComputer Science (R0)