Abstract
The proper measurement of emotion is vital to understanding the relationship between emotional expression in social media and other factors, such as online information sharing. This work develops a standardized annotation scheme for quantifying emotions in social media using recent emotion theory and research. Human annotators assessed both social media posts and their own reactions to the posts’ content on scales of 0 to 100 for each of 20 (Study 1) and 23 (Study 2) emotions. For Study 1, we analyzed English-language posts from Twitter (N = 244) and YouTube (N = 50). Associations between emotion ratings and text-based measures (LIWC, VADER, EmoLex, NRC-EIL, Emotionality) demonstrated convergent and discriminant validity. In Study 2, we tested an expanded version of the scheme in-country, in-language, on Polish (N = 3648) and Lithuanian (N = 1934) multimedia Facebook posts. While the correlations were lower than with English, patterns of convergent and discriminant validity with EmoLex and NRC-EIL still held. Coder reliability was strong across samples, with intraclass correlations of .80 or higher for 10 different emotions in Study 1 and 16 different emotions in Study 2. This research improves the measurement of emotions in social media to include more dimensions, multimedia, and context compared to prior schemes.
This is a preview of subscription content, access via your institution.





Notes
A full list of lexicons from Dr. Mohammad and colleagues is available here: http://saifmohammad.com/WebPages/lexicons.html.
Lemmatization is similar to stemming—it reduces each wordform to its citation form in the dictionary: e.g., hoping, hoped to hope; brought to bring. This is more important for morphologically rich languages like Polish and Lithuanian than for morphologically poor ones like English.
For this and subsequent correlation tables, we included tables of p values in Appendix Table 11, 12, 13, 14, 15 and 16. However, note that we are using correlations descriptively rather than inferentially, so we are interested in direction and magnitude rather than significance, which can vary by sample size.
The Polish bicultural social science researcher is fluent in English, Polish, and Russian; the Lithuanian researcher is fluent in English, Lithuanian, Russian, French, and German.
Delfi.lt “Lietuvos itakingiausieji 2018”, https://www.delfi.lt/apps/itakingiausieji2018/bendras/balsas.
https://nvoatlasas.lt/en/filtering/, Accessed on Aug 2, 2018.
To our delight, he describes kama muta, which we had already included in this study, as an example of an emotion that does not rely on English vernacular but is a meaningful construct to study.
Demszky et al. (2020) note, “We find that all Cohen’s kappa values are greater than 0, showing rater agreement” (p. 4051), also determine reliability using Spearman rho’s correlation, and refer to their interrater agreement as high.
Some methods go beyond keywords to use sophisticated machine learning paradigms, such as transformers (see e.g., Acheampong et al., 2021, for a review), but we will not discuss them here as they are less interpretable than keyword approaches, and it is not clear how their performance adds to theory of emotion.
One proprietary exception is Empath, not to be confused with the text-based Empath system described elsewhere. https://webempath.com
A full list of lexicons from Dr. Mohammad and colleagues is available here: http://saifmohammad.com/WebPages/lexicons.html.
References
Acheampong, F. A., Nunoo-Mensah, H., & Chen, W. (2021). Transformer models for text-based emotion detection: a review of BERT-based approaches. Artificial Intelligence Review, 54(8), 5789–5829.
Alm, C. O., Roth, D., Sproat, R. (2005). Emotions from text: machine learning for text-based emotion prediction. In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing - HLT ’05, 579–586. https://doi.org/10.3115/1220575.1220648
Aman, S., & Szpakowicz, S. (2007). Identifying expressions of emotion in text. Text, Speech and Dialogue, 4629, 196–205. https://doi.org/10.1007/978-3-540-74628-7_27
Arif, A., Stewart, L. G., Starbird, K. (2018). Acting the part: examining information operations within #BlackLivesMatter discourse. In Proceedings of the ACM on Human-Computer Interaction, 2(CSCW), 1–27. https://doi.org/10.1145/3274289
Barfar, A. (2019). Cognitive and affective responses to political disinformation in Facebook. Computers in Human Behavior, 101, 173–179. https://doi.org/10.1016/j.chb.2019.07.026
Barrett, L. F. (2006). Are emotions natural kinds? Perspectives on Psychological Science, 1, 28–58. https://doi.org/10.1111/j.1745-6916.2006.00003.x
Barrett, L. F., Mesquita, B., Ochsner, K. N., & Gross, J. J. (2007). The experience of emotion. Annual Review of Psychology, 58, 373–403. https://doi.org/10.1146/annurev.psych.58.110405.085709
Barrett, L. F., Adolphs, R., Marsella, S., Martinez, A. M., & Pollak, S. D. (2019). Emotional expressions reconsidered: Challenges to inferring emotion from human facial movements. Psychological Science in the Public Interest, 20, 1–68. https://doi.org/10.1177/1529100619832930
Bartholomew, K., Henderson, A. J. Z., & Marcia, J. E. (2000). Coded semistructured interviews in social psychological research. In H. T. Reis & C. M. Judd (Eds.), Handbook of research methods in social and personality psychology (pp. 286–312). Cambridge University Press.
Berger, J., & Milkman, K. (2012). What makes online content viral? Journal of Marketing Research, 49(2), 192–205. https://doi.org/10.1509/jmr.10.0353
Beskow, D. M., & Carley, K. M. (2019). Social cybersecurity: An emerging national security requirement. Military Review, 99, 117–126.
Bosco, F. A., Aguinis, H., Singh, K., Field, J. G., & Pierce, C. A. (2015). Correlational effect size benchmarks. Journal of Applied Psychology, 100(2), 431–449. https://doi.org/10.1037/a0038047
Bostan, L.-A.-M., Klinger, R. (2018). An analysis of annotated corpora for emotion classification in text. In Proceedings of the 27th international conference on computational linguistics, pp. 2104–2119.
Brady, W. J., Wills, J. A., Jost, J. T., Tucker, J. A., & Van Bavel, J. J. (2017). Emotion shapes the diffusion of moralized content in social networks. Proceedings of the National Academy of Sciences, 114(28), 7313–7318. https://doi.org/10.1073/pnas.1618923114
Canales, L., Daelemans, W., Boldrini, E., & Martinez-Barco, P. (2019). EmoLabel: Semi-automatic methodology for emotion annotation of social media text. IEEE Transactions on Affective Computing, 14, 579–591. https://doi.org/10.1109/TAFFC.2019.2927564
Chen, J., Yan, Y., & Leach, J. (2022). Are emotion-expressing messages more shared on social media? A meta-analytic review. Review of Communication Research, 10, 59–79. https://doi.org/10.12840/ISSN.2255-4165.034
Chen, E. (2022, July 11). 30% of Google’s emotions dataset is mislabeled. The Surge AI Blog. https://www.surgehq.ai//blog/30-percent-of-googles-reddit-emotions-dataset-is-mislabeled
Chess, S., & Shaw, A. (2015). A conspiracy of fishes, or, how we learned to stop worrying about #GamerGate and embrace hegemonic masculinity. Journal of Broadcasting & Electronic Media, 59(1), 208–220. https://doi.org/10.1080/08838151.2014.999917
Cicchetti, D. V. (1994). Guidelines, criteria, and rules of thumb for evaluating normed and standardized assessment instrument in psychology. Psychological Assessment, 6(4), 284–290. https://doi.org/10.1037/1040-3590.6.4.284
Clemente, F. M., Rabbani, A., & Araújo, J. P. (2019). Ratings of perceived recovery and exertion in elite youth soccer players: Interchangeability of 10-point and 100-point scales. Physiology & Behavior, 210, 112641.
Cowen, A. S., & Keltner, D. (2017). Self-report captures 27 distinct categories of emotion bridged by continuous gradients. Proceedings of the National Academy of Sciences, 114(38), E7900–E7909. https://doi.org/10.1073/pnas.1702247114
Cowen, A. S., & Keltner, D. (2020). What the face displays: Mapping 28 emotions conveyed by naturalistic expression. American Psychologist, 75(3), 349–364. https://doi.org/10.1037/amp0000488
Cowen, A. S., & Keltner, D. (2021). Semantic space theory: A computational approach to emotion. Trends in Cognitive Sciences, 25(2), 124–136. https://doi.org/10.1016/j.tics.2020.11.004
Cowen, A. S., Elfenbein, H. A., Laukka, P., & Keltner, D. (2019). Mapping 24 emotions conveyed by brief human vocalization. American Psychologist, 74(6), 698–712. https://doi.org/10.1037/amp0000399
Cowen, A. S., Laukka, P., Elfenbein, H. A., Liu, R., & Keltner, D. (2019). The primacy of categories in the recognition of 12 emotions in speech prosody across two cultures. Nature Human Behaviour, 3, 369–382.
Cowen, A. S., Sauter, D., Tracy, J. L., & Keltner, D. (2019). Mapping the passions: Towards a high-dimensional taxonomy of emotional experience and expression. Psychological Science in the Public Interest, 20(1), 69–90. https://doi.org/10.1177/152910061985017
DataReportal. (2019, January 31). Digital 2019: Lithuania. Retrieved from https://datareportal.com/reports/digital-2019-lithuania
DataReportal. (2020, February 18). Digital 2020: Poland. Retrieved from https://datareportal.com/reports/digital-2020-poland (data from Global Web Index).
DataReportal. (2023, January 26). Global overview report. Retrieved from https://datareportal.com/reports/digital-2023-global-overview-report
Dawes, J. (2008). Do data characteristics change according to the number of scale points used? An experiment using 5-point, 7-point and 10-point scales. International Journal of Market Research, 50(1), 61–104.
Demszky, D., Movshovitz-Attias, D., Ko, J., Cowen, A., Nemade, G., Ravi, S. (2020). GoEmotions: A dataset of fine-grained emotions. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (pp. 4040–4054). ArXiv:2005.00547v2.
Devillers, L., Vidrascu, L., & Lamel, L. (2005). Challenges in real-life emotion annotation and machine learning based detection. Neural Networks, 18(4), 407–422. https://doi.org/10.1016/j.neunet.2005.03.007
Ekman, P. (1992). An argument for basic emotions. Cognition and Emotion, 6(3–4), 169–200. https://doi.org/10.1080/02699939208411068
Ekman, P., & Cordaro, D. (2011). What is meant by calling emotions basic. Emotion Review, 3(4), 364–370. https://doi.org/10.1177/1754073911410740
Elfenbein, H. A., & Ambady, N. (2002). On the universality and cultural specificity of emotion recognition: A meta-analysis. Psychological Bulletin, 128(2), 203–235. https://doi.org/10.1037//0033-2909.128.2.203
Emmons, R. A., & McCullough, M. E. (2003). Counting blessings versus burdens: An experimental investigation of gratitude and subjective well-being in daily life. Journal of Personality and Social Psychology, 84(2), 377–389. https://doi.org/10.1037/0022-3514.84.2.377
Fiske, A. P. (2020). The lexical fallacy in emotion research: Mistaking vernacular words for psychological entities. Psychological Review., 127(1), 95–113. https://doi.org/10.1037/rev0000174
Gamer, M., Lemon, J., Fellows, I., & Singh, P. (2019, January 26). IRR: Various coefficients of interrater reliability and agreement. R package version 0.84.1. https://CRAN.R-project.org/package=irr
Gendron, M., Hoemann, K., Crittenden, A. N., Mangola, S. M., Ruark, G. A., & Barrett, L. F. (2020). Emotion perception in Hadza hunter-gatherers. Scientific Reports, 10, 3867. https://doi.org/10.1038/s41598-020-60257-2
Goetz, J. L., Spencer-Rodgers, J., & Peng, K. (2008). Dialectical emotions: How cultural epistemologies influence the experience and regulation of emotional complexity. In R. M. Sorrentino & S. Yamaguchi (Eds.), Handbook of motivation and cognition across cultures (pp. 517–539). Academic Press.
Goetz, J. L., Keltner, D., & Simon-Thomas, E. (2010). Compassion: An evolutionary analysis and empirical review. Psychological Bulletin, 136(3), 351–374. https://doi.org/10.1037/a0018807
Golonka, E. M., Jones, K. M., Sheehan, P., Pandža, N. B., Paletz, S. B. F., Rytting, C. A., & Johns, M. (2023). The construct of cuteness: A validity study for measuring content and evoked emotions in on social media. Frontiers in Psychology, 14, 1068373. https://doi.org/10.3389/fpsyg.2023.1068373
Hipson, W. E., Mohammad, S. M. (2020). PoKi: A large dataset of poems by children. In Proceedings of the 12th conference on language resources and evaluation (LREC 2020), pp. 1578–1589.
Hofmann, J., Troiano, E., Sassenberg, K., & Klinger, R. (2020). Appraisal theories for emotion classification in text. In Proceedings of the 28th international conference on computational linguistics, 125–138.
Hutto, C. J. (2018). VADER sentiment analysis ReadMe file. GitHub. https://github.com/cjhutto/vaderSentiment
Hutto, C. J., & Gilbert, E. (2014). VADER: A parsimonious rule-based model for sentiment analysis of social media text. In Proceedings of the eighth international conference on weblogs and social media (ICWSM-14), pp. 216–225. https://www.aaai.org/ocs/index.php/ICWSM/ICWSM14/paper/download/8109/8122/
Immordino-Yang, M. H., McColl, A., Damasio, H., & Damasio, A. (2009). Neural correlates of admiration and compassion. PNAS, 106, 802–8026. https://doi.org/10.1073/pnas.0810363106
Jiang, S., Wilson, C. (2018). Linguistic signals under misinformation and fact-checking: Evidence from user comments on social media. In Proceedings of the ACM on Human-Computer Interaction, 2(CSCW), pp. 1–23. https://doi.org/10.1145/3274351
Keltner, D. (1995). Signs of appeasement: Evidence for the distinct displays of embarrassment, amusement, and shame. Journal of Personality and Social Psychology, 68(3), 441–454. https://doi.org/10.1037/0022-3514.68.3.441
Keltner, D., & Haidt, J. (2003). Approaching awe, a moral, spiritual, and aesthetic emotion. Cognition & Emotion, 17(2), 297–314. https://doi.org/10.1080/02699930302297
Kross, E., Verduyn, P., Boyer, M., Drake, B., Gainsburg, I., Vickers, B., Ybarra, O. et al. (2019). Does counting emotion words on online social networks provide a window into people’s subjective experience of emotion? A case study on Facebook. Emotion, 19, 97–107. https://doi.org/10.1037/emo0000416
Marler, P. (1977). The evolution of communication. In T. A. Sebeok (Ed.), How animals communicate (pp. 45–70). Indiana University Press.
Marler, P., & Evans, C. (1997). Animal sounds and human faces: Do they have anything in common? In J. A. Russell & J. M. Fernández-Dols (Eds.), Studies in emotion and social interaction, 2nd series. The psychology of facial expression (pp. 133–157). Cambridge University Press. https://doi.org/10.1017/CBO9780511659911
Messick, S. (1995). Validity of psychological assessment: Validation of inferences form persons’ responses and performances as scientific inquiry into score meaning. American Psychologist, 50(9), 741–749. https://doi.org/10.1037/0003-066X.50.9.741
Mohammad, S. M., & Kiritchenko, S. (2015). Using hashtags to capture fine emotion categories from tweets. Computational Intelligence, 31(2), 301–326.
Mohammad, S. M., & Turney, P. D. (2013). Crowdsourcing a word–emotion association lexicon. Computational Intelligence, 29(3), 436–465.
Mohammad, S. M., Zhu, X., Kiritchenko, S., & Martin, J. (2015). Sentiment, emotion, purpose, and style in election tweets. Information Processing & Management, 51(4), 480–499. https://doi.org/10.1016/j.ipm.2014.09.003
Mohammad, S. M., Bravo-Marquez, F. (2017). WASSA-2017 shared task on emotional intensity. In Proceedings of the 8th workshop on computational approaches to subjectivity, sentiment and social media analysis, pp. 34–49.
Mohammad, S. M. (2018). Word affect intensities. In Proceedings of the 11th edition of the language resources and evaluation conference (LREC-2018), pp. 174–183.
Munezero, M., Montero, C. S., Sutinen, E., & Pajunen, J. (2014). Are they different? Affect, feeling, emotion, sentiment, and opinion detection in text. IEEE Transactions on Affective Computing, 5(2), 101–111. https://doi.org/10.1109/TAFFC.2014.2317187
Murauskaite, E. E., Johns, M. A., Paletz, S. B. F., & Pandža, N. B. (in press). How does it feel to talk about Russia? Emotions and themes in Russia-related social media posts in Lithuania. Journal of Baltic Studies.
Novielli, N., Calefato, F., Lanubile, F. (2018). A gold standard for emotion annotation in stack overflow. In Proceedings of the 15th International Conference on Mining Software Repositories, pp. 14–17. https://doi.org/10.1145/3196398.3196453
Oberländer, L. A. M., Kim, E., & Klinger, R. (2020). GoodNewsEveryone: A corpus of news headlines annotated with emotions, semantic roles, and reader perception. In Proceedings of the 12th Language Resources and Evaluation Conference (pp. 1554–1566). https://aclanthology.org/2020.lrec-1.194/
Ortony, A. (2022). Are all “basic emotions” emotions? A problem for the (basic) emotions construct. Perspectives on Psychological Science, 17, 41–61. https://doi.org/10.1177/1745691620985415
Paletz, S. B. F. (Ed.). (2018) Measuring emotions in social media: Examining the relationship between emotional content and propagation. [Report submitted to the United States Government]. University of Maryland Center for Advanced Study of Language.
Paletz, S. B. F., Auxier, B. E., & Golonka, E. M. (2019). A multidisciplinary framework of information propagation online. Springer Nature. https://doi.org/10.1007/978-3-030-16413-3
Paletz, S. B. F., Golonka, E. M., Stanton, G., Murauskaite, E., Ryan, D., Rytting, C. A., Bradley, P. (2020). Emotion annotation guide for social media, Version 3.32. UMD Applied Research Laboratory for Intelligence and Security.
Paletz, S. B. F., Golonka, E. M., Stanton, G., Murauskaite, E., Ryan, D., Rytting, C. A., Bradley, P. (2022a). Social Media Emotions Annotation Guide (SMEmo), Version 4.0. UMD Applied Research Laboratory for Intelligence and Security.
Paletz, S. B. F., Golonka, E. M., Murauskaite, E. E., Pandža, N. B., Stanton, G., Ryan, D., Johns, M. et al. (2022b). Adapting an emotion annotation guide from the US to Poland and Lithuania. In 26th International Congress of the International Association for Cross-Cultural Psychology, virtual conference. https://iaccp2022.com/wp-content/uploads/2022b/07/03072022_Oral_Thematic-Discussions_sorted.pdf
Paletz, S. B. F., Johns, M. A., Murauskaite, E. E., Golonka, E. M., Pandža, N. B., Rytting, C. A., Buntain, C. et al. (in press). Emotional content and sharing on Facebook: A theory cage match. Science Advances.
Pennebaker, J. W., Chung, C. K., Ireland, M., Gonzales, A., & Booth, R. J. (2007). The development and the psychometric properties of LIWC2007. LIWC.net.
Pennebaker, J. W., Boyd, R. L., Jordan, K., & Blackburn, K. (2015). The development and psychometric properties of LIWC2015. University of Texas at Austin. https://doi.org/10.15781/T29G6Z
Pennebaker Conglomerates. (2016). LIWC: How it works. Retrieved June 15, 2016 from http://liwc.wpengine.com/how-it-works/
Peters, K., Kashima, Y., & Clark, A. (2009). Talking about others: Emotionality and the dissemination of social information. European Journal of Social Psychology, 39(2), 207–222. https://doi.org/10.1002/ejsp.523
Plutchik, R. (1962). The emotions: Facts, theories, and a new model. Random House.
Plutchik, R. (2001). The nature of emotions. American Scientist, 89, 344–350.
Poria, S., Cambria, E., Bajpai, R., & Hussain, A. (2017). A review of affective computing: From unimodal analysis to multimodal fusion. Information Fusion, 37, 98–125. https://doi.org/10.1016/j.inffus.2017.02.003
Preston, C. C., & Colman, A. M. (2000). Optimal number of response categories in rating scales: reliability, validity, discriminating power, and respondent preferences. Acta Psychologica, 104(1), 1–15.
Rocklage, M. D., Rucker, D. D., & Nordgren, L. F. (2018). The Evaluative Lexicon 2.0: The measurement of emotionality, extremity, and valence in language. Behavior Research Methods, 50, 1327–1344. https://doi.org/10.3758/s13428-017-0975-6
Rocklage, M. (2022). The lexical suite. Retrieved November 9, 2022 from http://www.lexicalsuite.com/.
Russell, J. A. (2003). Core affect and the psychological construction of emotion. Psychological Review, 110(1), 145–172. https://doi.org/10.1037/0033-295X.110.1.145
Russell, J. A. (2014). Four perspectives on the psychology of emotion: An introduction. Emotion Review, 6(4), 291. https://doi.org/10.1177/1754073914534558
Schaeffer, K. (2019, December 20). U.S. has changed in key ways in the past decade, from tech use to demographics. Pew Research Center. https://www.pewresearch.org/fact-tank/2019/12/20/key-ways-us-changed-in-past-decade/
Scherer, K. R., & Wallbott, H. (1994). Evidence for universality and cultural variation of differential emotion response -patterning. Journal of Personality & Social Psychology, 66(2), 310–328.
Schimmack, U., Oishi, S., & Diener, E. (2002). Cultural influences on the relation between pleasant emotions and unpleasant emotions: Asian dialectic philosophies or individualism-collectivism? Cognition and Emotion, 16(6), 705–719. https://doi.org/10.1080/02699930143000590
Schuff, H., Barnes, J., Mohme, J., Padó, S., & Klinger, R. (2017). Annotation, modelling and analysis of fine-grained emotions on a stance and sentiment detection corpus. In Proceedings of the 8th workshop on computational approaches to subjectivity, sentiment and social media analysis, pp. 13–23. https://doi.org/10.18653/v1/W17-5203
Sedikides, C., Wildschut, T., Arndt, J., & Routledge, C. (2008). Nostalgia: Past, present, and future. Current Directions in Psychological Science, 17, 304–307.
Shiota, M. N., Keltner, D., & Mossman, A. (2007). The nature of awe: Elicitors, appraisals, and effects on self-concept. Cognition and Emotion, 21(5), 944–963. https://doi.org/10.1080/02699930600923668
Shiota, M. N., Campos, B., Oveis, C., Hertenstein, M. J., Simon-Thomas, E., & Keltner, D. (2017). Beyond happiness: Building a science of discrete positive emotions. American Psychologist, 72(7), 617–643. https://doi.org/10.1037/a0040456
Silvia, P. (2009). Looking past pleasure: Anger, confusion, disgust, pride, surprise, and other unusual aesthetic emotions. Psychology of Aesthetics, Creativity, and the Arts, 3(1), 48–51. https://doi.org/10.1037/a0014632
Smith, C. P. (2000). Content analysis and narrative analysis. In H. T. Reis & C. M. Judd (Eds.), Handbook of research methods in social and personality psychology (pp. 313–335). Cambridge University Press.
Spencer-Rodgers, J., Peng, K., & Wang, L. (2010). Dialecticism and the co-occurrence of positive and negative emotions across cultures. Journal of Cross-Cultural Psychology, 41(1), 109–115. https://doi.org/10.1177/0022022109349508
Stark, L., Hoey, J. (2020). The ethics of emotion in AI Systems. https://doi.org/10.31219/osf.io/9ad4u
Steinnes, K. K., Blomster, J. K., Seibt, B., Zickfeld, J. H., Fiske, A., & P. (2019). Too cute for words: Cuteness evokes the heartwarming emotion of kama muta. Frontiers in Psychology, 10, 387. https://doi.org/10.3389/fpsyg.2019.00387
Stieglitz, S., & Dang-Xuan, L. (2013). Emotions and information diffusion in social media—Sentiment of microblogs and sharing behavior. Journal of Management Information Systems, 29(4), 217–248. https://doi.org/10.2753/MIS0742-1222290408
Strapparava, C., & Mihalcea, R. (2007). SemEval-2007 Task 14: Affective text. In Proceedings of the 4th international workshop on semantic evaluations (SemEval-2007), pp. 70–74. https://aclanthology.org/S07-1013.pdf
Strappavara, C., & Valitutti, A. (2004). WordNet-Affect: An affective extension of WordNet. In Proceedings of the fourth international conference on resources and evaluation LREC 2004, pp. 1083–1086. http://www.lrec-conf.org/proceedings/lrec2004/pdf/369.pdf
Sun, J., Schwartz, H. A., Son, Y., Kern, M. L., & Vazire, S. (2020). The language of well-being: Tracking fluctuations in emotion experience through everyday speech. Journal of Personality and Social Psychology: Personality Processes and Individual Differences, 118, 364–387. https://doi.org/10.1037/pspp0000244
Tausczik, Y. R., & Pennebaker, J. W. (2010). The psychological meaning of words: LIWC and computerized text analysis methods. Journal of Language and Social Psychology, 29(1), 24–54. https://doi.org/10.1177/0261927X09351676
Troiano, E., Padó, S., & Klinger, R. (2019). Crowdsourcing and validating event-focused emotion corpora for German and English. In Proceedings of the 57th annual meeting for the association for computational linguistics, pp. 4005–4011.
Trujillo, J. P., & Holler, J. (2023). Interactionally embedded gestalt principles of multimodal human communication. Perspectives on Psychological Science. https://doi.org/10.1037/0033-2909.128.2.203
van Atteveldt, W., van der Velden, M. A. C. G., & Boukes, M. (2021). The validity of sentiment analysis: Comparing manual annotation, crowd-coding, dictionary approaches, and machine learning algorithms. Communication Methods and Measures, 15, 121–140. https://doi.org/10.1080/19312458.2020.1869198
van de Vijver, F. J. R., & Leung, K. (2021). Methods and data analysis for cross-cultural research (2nd ed.). Sage.
Vega, M. Y., Klukas, E., & Dabbah, A. I. (2014). #Retweet this: HIV stigma in the twitterverse. International AIDS Conference.
Volkova, E. P., Mohler, B. J., Meurers, D., Gerdemann, D., & Bülthoff, H. H. (2010). Emotional perception of fairy tales: Achieving agreement in emotion annotation of text. In Proceedings of the NAACL Workshop on Computational Approaches to Analysis and Generation of Emotion in Text, Los Angeles, CA, pp. 98–106. https://www.aclweb.org/anthology/W10-0212
Vosoughi, S., Roy, D., & Aral, S. (2018). The spread of true and false news online. Science, 359(6380), 1146–1151. https://doi.org/10.1126/science.aap9559
Walker, L., Baines, P. R., Dimitriu, R., & Macdonald, E. K. (2017). Antecedents of retweeting in a (political) marketing context. Psychology & Marketing, 34(3), 275–293. https://doi.org/10.1002/mar.20988
Wang, Y., Callan, J., & Zheng, B. (2015). Should we use the sample? Analyzing datasets sampled from Twitter’s stream API. ACM Transactions on the Web, 9, 1–23. https://doi.org/10.1145/2746366
Wang, W., Chen, L., Thirunarayan, K., Sheth, A. P. (2012). Harnessing Twitter “big data” for automatic emotion identification. In 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Conference on Social Computing, pp. 587–592. https://doi.org/10.1109/SocialCom-PASSAT.2012.119
Watson, D., & Clark, L. A. (1994). The PANAS-X: Manual for the positive and negative affect schedule – Expanded form. University of Iowa. https://doi.org/10.17077/48vt-m4t2
Watson, D., & Tellegen, A. (1985). Toward a consensual structure of mood. Psychological Bulletin, 98(2), 219–235. https://doi.org/10.1037/0033-2909.98.2.219
Weber, R. P. (1990). Basic content analysis (2nd ed.). New York: Sage.
Wiebe, J., Wilson, T., & Cardie, C. (2005). Annotating expressions of opinions and emotions in language. Language Resources and Evaluation, 39(2–3), 165–210. https://doi.org/10.1007/s10579-005-7880-9
Wildschut, T., Bruder, M., Robertson, S., van Tilburg, W. A. P., & Sedikides, C. (2014). Collective nostalgia: A group-level emotion that confers unique benefits on the group. Journal of Personality and Social Psychology, 107(5), 844–863.
Wispé, L. (1986). The distinction between sympathy and empathy: To call forth a concept, a word is needed. Journal of Personality and Social Psychology, 50(2), 314–321. https://doi.org/10.1037/0022-3514.50.2.314
Woolley, S. C., & Howard, P. N. (2017). Computational propaganda worldwide: Executive summary. Working Paper No. 2017.11. The Computational Propaganda Project, Oxford Internet Institute, University of Oxford. http://comprop.oii.ox.ac.uk/wp-content/uploads/sites/89/2017/06/Casestudies-ExecutiveSummary.pdf
Wright, C. L., & Rubin, M. (2017). “Get lucky!” Sexual content in music lyrics, videos and social media and sexual cognitions and risk among emerging adults in the USA and Australia. Sex Education, 17, 41–56. https://doi.org/10.1080/14681811.2016.1242402
Yadollahi, A., Shahraki, A. G., & Zaiane, O. R. (2017). Current state of text sentiment analysis from opinion to emotion mining. ACM Computing Surveys, 50, 25. https://doi.org/10.1145/3057270
Zaśko-Zielińska, M., & Piasecki, M. (2018). Towards emotive annotation in plWordNet 4.0. In Proceedings of the 9th Global Wordnet Conference (pp. 153–162). https://aclanthology.org/volumes/2018.gwc-1/
Zhang, R., & Liu, N. (2014). Recognizing humor on Twitter. In Proceedings of the 23rd ACM international conference on conference on information and knowledge management (pp. 889–898).
Zhang, Y., Weninger, F., Schuller, B., & Picard, R. (2019). Holistic affect recognition using PaNDA: Paralinguistic Non-metric Dimensional Analysis. In IEEE Transactions on Affective Computing. https://doi.org/10.1109/TAFFC.2019.2961881
Zickfeld, J. H., Schubert, T. W., Seibt, B., Blomster, J. K., Arriaga, P., Basabe, N., Blaut, A., Caballero, A., Carrera, P., Dalgar, I., Ding, Y., Dumont, K., Gaulhofer, V., Gračanin, A., Gyenis, R., Hu, C.-P., Kardum, I., Lazarević, L. B., Mathew, L., ..., Fiske, A. P. (2019). Kama muta: Conceptualizing and measuring the experience often labelled being moved across 19 nations and 15 languages. Emotion, 19(3), 402–424. https://doi.org/10.1037/emo0000450
Author Note
Susannah Paletz is at the College of Information Studies, University of Maryland, College Park, affiliated with two University of Maryland centers: the Social Data Science Center (SoDa) and the Applied Research Laboratory for Intelligence and Security (ARLIS), which used to be the Center for Advanced Study of Language (CASL). For Study 1, Drs. Paletz, Golonka, Adams, and Bradley were all at CASL. Ms. Stanton and Mr. Ryan were interns at CASL through the START internship program at the University of Maryland, College Park. Ewa Golonka, Nick B. Pandža, and C. Anton Rytting are currently at ARLIS. David Ryan is at Stanford University in the Computer Science Department and the Feminist, Gender and Sexuality Studies Department. Egle E. Murauskaite is at the ICONS project at the University of Maryland, College Park. Michael Johns was at ARLIS during this work but is now at the Institute for Systems Research, University of Maryland. Cody Buntain was at the New Jersey Institute for Technology during data collection but is now at the College of Information Studies, University of Maryland, College Park. We have no known conflict of interest to disclose. This material is based on work supported, in whole or in part, with funding from the United States Government Office of Naval Research (ONR) grant 12398640 and Minerva Research Initiative / ONR Grant #N00014-19-1-2506. Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the University of Maryland, College Park and/or any agency or entity of the US Government. A presentation describing some of Study 2’s reliability information and adapting the annotation guide to Polish and Lithuanian was presented at the 26th International Congress of the International Association for Cross-Cultural Psychology (July, 2022). The authors are grateful to Susan Campbell, Brooke Auxier, and two prior anonymous reviewers for their suggestions on an earlier version of this paper, as well as to Nataliya Stepanova for her assistance with sampling for Study 2. We are also deeply grateful to our Study 2 annotators: Agata Bieniek, Anna Kostrzewa, Gabrielė Kundrotaitė, Agata Kuzia, Klaudia Kuźnicka, Małgorzata Perczak-Partyka, Rafał Rosiak, Laura Russak, Austėja Serbentaitė, Ewa Szczepska, Karolina Tokarek, Aurelija Tylaitė, and Marta Urbańska-Łaba.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Practices Statement
Some of the data or materials for the experiments reported here are available. We are uploading as an electric supplement a data file that includes the Tweet IDs and links, YouTube IDs andhyperlinks, and our original coding: https://doi.org/10.3758/s13428-023-02195-1. However, due to the terms of service of the social media platforms, we cannot send (or download) the YouTube videos or the text or images of tweets. We packaged the annotated corpus of Polish and Lithuanian Facebook posts for access with permission on a site at our university and it is available here: http://hdl.handle.net/1903/29776. We are also very interested in sharing our latest annotation guide upon request. These analyses, being by their nature explicitly exploratory and novel, were not preregistered, but are available upon request.
Supplementary Information
Below is the link to the electronic supplementary material.
Appendices
Appendix 1: Additional relevant literature
Prior annotation of emotion
Several researchers have annotated various corpora for emotion. In the social media problem space, none of the schemes do all of what we are attempting. Other annotation schemes exist outside of social media, some of which we review. Many of these schemes use a much smaller set of emotions based on older theories of emotion (e.g., Alm et al., 2005; Aman & Szpakowicz, 2007; Novielli et al., 2018; Schuff et al., 2017), do not measure intensity within/of emotions (e.g., Alm et al., 2005; Novielli et al., 2018), or use exclusive coding such that only one emotion is chosen per unit annotated (e.g., Alm et al., 2005; Volkova et al., 2010). Even those that measure mixtures of emotions simply tag the annotated unit as “mixed emotion” without measuring the intensity of each (e.g., Aman & Szpakowicz, 2007; see Table 9).
For example, the widely cited article by Wiebe et al. (2005) discusses in detail how to annotate language (e.g., speech events) for opinions and valence. While the title implies an emotion annotation scheme and several examples use emotion words (e.g., “The U.S. fears a spill-over”, p. 173), they simply propose polarity for attitudes or private states (positive, negative, both, and neither), and four levels of intensity of the statement itself (low, medium, high, extreme; e.g., the intensity of “The U.S. fears a spill-over” is listed as medium, p. 174). Wiebe et al. (2005) do cite other emotion taxonomies, but do not incorporate them into their own scheme.
Separately, Alm et al. (2005) created an affective text-to-speech system using stories written by Beatrix Potter, H. C. Andersen, and the Brothers Grimm. They annotated for emotions, expanding from Ekman’s six emotions (anger, disgust, fear, sadness, happiness, positive surprise, negative surprise, and neutral). Two annotators independently coded each sentence for an emotion or lack thereof. The authors found emotion annotation difficult, with inter-annotator agreement ranging between .24 and .51. Aman and Szpakowicz (2007), expanding on Alm et al. (2005), used a mixture of manual and automated (using seed words) methods to identify Ekman’s six basic emotions (plus mixed emotion and no emotion) sentence by sentence in blog posts. This framework incorporated intensity (low, medium, high) and the possibility of mixed emotion, but not what those mixed emotions were. Their reliability statistics were superior to Alm et al. (2005), with average pairwise kappa ranging generally from .60 to .79 (with the exception being mixed emotion, which was .43). Happiness and fear had the highest reliabilities and surprise the lowest.
Volkova et al. (2010) further expanded the work of Alm et al. (2005) to develop automated tools to detect emotions in text. The authors selected a group of fifteen emotions made up of seven positive (relief, joy, hope, interest, compassion, surprise, and approval), seven negative (disturbance, sadness, despair, disgust, hatred, fear, and anger), and neutral. They also measured intensity on a one (closest to neutral) to five (extreme polarization) scale. Before the main study, the authors had the coders agree on eight related clusters of emotions from the original fifteen (e.g., {joy, approval}; {disgust, anger, hatred}) before coding. They then annotated lemmatized word lists as to whether each word would change the polarity of a potential context. Only 4% of total number of items from each word list was scored as having multiple polarities (negative/neutral or positive/neutral) by the same annotator (Volkova et al., 2010). For the main experiment, ten coders assessed eight Grimm fairy tales in Standard German. The Manual Emotion Annotation Tool (MEAT) allowed coders to highlight and select an emotion code for that string of text if the participant were to read the text out loud, which would then add nonverbal signals (e.g., facial expressions). This annotation process then involved labeling only one of the fifteen emotions per chosen text. Within coder, simultaneous expression of different emotions (emotional complexity) could therefore not be detected using this method. However, given that different coders could choose different and overlapping text units using MEAT, it might be possible to use their method to detect simultaneous and overlapping emotions between coders. These researchers had to create a reliability metric to account for the text unit differing across coders. They also utilized the clusters determined previously for testing reliability.
More recently, Novielli et al. (2018) created a corpus of annotated emotions using 4800 posts between software developers on Stack Overflow. They examined six emotions overlapping with Ekman’s basic emotions (love, joy, surprise, anger, sadness, and fear). While intensity was not measured (only the presence/absence of the emotion), emotional complexity (i.e., multiple emotion labels) was allowed, and they found that 3% of their posts were labeled with two emotions. Each post was annotated by three coders and the observed raw agreement was high, from .86 and .98; however, the authors note that the Fleiss’ kappa was much smaller, from .30 to .62 due to the low frequency of some emotions (e.g., surprise).
Not all emotion annotation schemes rely solely on text. The broad and productive literature on affective computing often includes recognition of emotions in speech and video, although it is usually not geared toward social media posts and may still use limited sets of emotions (see for reviews, Devillers et al., 2005; Poria et al., 2017). Devillers et al. (2005) attempted to overcome the limits of small sets of emotions and created a multi-modal annotation scheme by using a corpus of recorded and transcribed phone call exchanges from two different types of call centers (financial and medical). This Multi-level Emotion and Context Annotation Scheme (MECAS) was developed for both speech-only and multimodal data and allowed for fine-grained, as well as seven coarse-grained, emotions. Each turn of speech was annotated for 24 labels. Their 21 fine-grained emotions were anxiety, stress, fear, panic, annoyance, impatience, cold anger, hot anger, disappointment, sadness, despair, hurt, embarrassment, relief, interest, amusement, surprise, neutral, dismay, resignation, and compassion. MECAS also includes positive, negative, and unknown labels. These emotions were labeled in intensity on a five-point scale. Devillers et al. (2005) went beyond typical annotation to acknowledge potential mixtures of emotions, allowing both a major and minor emotion label for each annotated unit, with major being the focal point, and minor being a less obvious but still present element of emotion. These blended emotions were defined in three different segments: conflictual, when the major and minor emotions are not the same valenced emotion (i.e., one negative and one positive); nonconflictual, when the major and minor emotions had same valence value (e.g., both positive); and ambiguous, when the major and minor emotions are same coarse-grained emotion, which is more specific than valence. While MECAS overcomes many of the issues we criticize in other schemes (has measures of intensity, includes several emotions, allows for more than one type of emotion), some of the emotions that their list separates into different categories we would consider to be different levels of intensity of the same emotion (e.g., annoyance and anger) and does not include other emotions that might be important for annotation in social media contexts (e.g., contempt, pride). It also only allows for the co-occurrence of at most two emotions. Further, as with the others reviewed here, because they were not created for social media, these annotation schemes do not distinguish between the personal reactions of the coders and the content of the text.
Some of these annotation guides were the basis of attempts to create keyword-based or automated measures for emotions (e.g., Aman & Szpakowicz, 2007), and additional corpora that are limited to sentiment and/or Ekman/Plutchik lists of emotion abound (see review in Schuff et al., 2017). Given the growing importance of and problems with automated ways of measuring emotion (Stark & Hoey, 2020), we discuss some of the more popular automated measures.
Popular automated assessments of sentiment and emotion
There exist many automated, text-based methods to detect sentiment and emotion (e.g., Stieglitz & Dang-Xuan, 2013; Strapparava & Valitutti, 2004).Footnote 11 While this section describes some (and additional) types mentioned in the above text, there is more methodological detail here. There has long been interest in automating the presence of emotion in text (e.g., Strapparava & Mihalcea, 2007; see Bostan & Klinger, 2018 for a review). Automated measures are often deployed because they can scale to a huge number of posts and are relatively easy to use, versus annotation, which is time consuming even when not using trained annotators who understand the language and culture of the social media. Despite these realistic constraints that lead many researchers to use automated methods, it is important to point out that many of these metrics rely on emotion lists that are outdated or brief, and they thus may be limited with regards to social media in particular (e.g., Plutchik’s list, see Mohammad et al., 2015; see also Bostan & Klinger, 2018, and Schuff et al., 2017 for a review). This section briefly describes some of the popular automated assessments. By their nature, these metrics currently rely on text-based assessments, and so simply cannot cover as many modalities as the annotation of an entire post that includes multimedia content (for a review of text-based emotion mining, see Yadollahi et al., 2017).Footnote 12
One of the most popular, Linguistic Inquiry and Word Count (LIWC), is used to score sentiment and a small set of emotions. LIWC assesses text by counting words from a dictionary of specific word lists (Pennebaker et al., 2015). LIWC has a general affect word list, as well as separate positive and negative affect words lists (Tausczik & Pennebaker, 2010). The negative affect word list includes words from separate anger, sadness, and anxiety lists, but extends beyond them; the positive affect list includes a set of words such as love, nice, and sweet (Pennebaker et al., 2015). LIWC is a relatively easy-to-use, inexpensive program that can be applied to a range of different document types. The different categories were validated by comparing expert judges’ ratings of each individual word as to its fit within the categories (Pennebaker et al., 2007), and internal reliability was tested based on co-occurrence of words (corrected internal consistency alpha = .64 for positive affect, .55 for negative affect, .73 for anxiety, .53 for anger, and .70 for sadness, Pennebaker et al., 2015). The creators of LIWC initially warned against its use in short text, noting it was designed to be used with at least 50 words (Pennebaker Conglomerates, 2016). This limitation may have made it less appropriate for analyzing short social media text such as tweets, which historically have been limited to 140 characters and even now allow only 280. LIWC has been used widely to study social media, including examining emotional and other type of content in comments on disinformation versus true news in Facebook posts (e.g., Barfar, 2019; though see below). More recent versions (starting with 2015) include ‘netspeak’ and are more geared toward social media analysis.
As alluded to above, the National Research Council (NRC) Emotion Lexicon provides several manually annotated sentiment and emotion lexicons,Footnote 13 of which the oldest and best known is the NRC Emotion Lexicon or EmoLex (Mohammad & Turney, 2013). EmoLex consists of human annotations of positive and negative sentiment and Plutchik’s (1962, 2001) eight basic emotions—joy, sadness, anger, fear, disgust, surprise, trust, and anticipation—for 14,200 English word types, of which 4462 are annotated as being associated with at least one of the eight emotions. More recently the NRC has released a new lexicon building on EmoLex, the NRC Emotion/Affect Intensity Lexicon (NRC-EIL; Mohammad, 2018). Unlike the simple Boolean (binary) annotations in EmoLex, the NRC-EIL provides scalar intensities for emotional association with nearly 10,000 words either chosen from EmoLex or frequently co-occurring with other emotional words. Although both lexicons were annotated solely by English-speaking annotators rating English words, the NRC provides automatic translations of the original English lexicons in over a hundred languages, including Polish and Lithuanian. As with other lexicons, this method leaves out many emotions (Cowen & Keltner, 2017, 2021).
More promising is VADER (Valence Aware Dictionary for Sentiment Reasoning; Hutto & Gilbert, 2014), which was created by constructing a list of over 9000 lexical feature candidates, including emoticons and slang, and then collecting intensity and polarity ratings of those features from ten independent raters. Several iterations of assessments, including with two human experts, were used to qualitatively identify properties of text that would affect perceived sentiment intensity (e.g., capitalization, certain punctuation). The final version of VADER, which measures positive and negative sentiment, thus went beyond a word list to incorporate rules and heuristics garnered by humans.
Hutto and Gilbert (2014) compared VADER’s coding performance to that of seven sentiment analysis lexicons, including LIWC. VADER performed as well as individual human raters at matching the aggregated mean from 20 coders for sentiment intensity for each tweet (text-only), which was their measure of ground truth (r = .88). When measured on a three-way sentiment classification task, VADER had a precision score of .99, a recall score of .94, and an overall F-score of .96, (Hutto & Gilbert, 2014). LIWC scored lower on both intensity matching and classification, with a correlation with the social media text gold standard of r = .62, precision of .94, recall of .48, and overall F-score of .63. Thus, VADER may be superior to LIWC and other lexical measures in examining sentiment polarity in Twitter data. However, VADER does not measure specific emotions, but positivity to negativity.
A separate lexicon relevant here is the Lexical Suite (http://www.lexicalsuite.com/), which is the second version of the Evaluative Lexicon (Rocklage et al., 2018). In addition to measures of valence that are strongly correlated with LIWC valence, this suite has a measure of emotionality. Emotionality here is described as the degree to which a person’s evaluation or attitude as expressed in text is emotional versus more cognitive. Rather than simply expressing how much a person likes or dislikes something, the words capture amounts of emotion, generally speaking (e.g., “fantastic” has a higher emotionality score than “valuable”). The authors created these word lists through a lengthy, iterative process including ratings by judges of words. They found that Emotionality was uncorrelated with LIWC valence (-.11) but significantly positively associated with arousal (.43), suggesting discriminant and convergent validity, respectively (Rocklage et al., 2018). Of note, this measure does not distinguish different emotions (e.g., love from contempt). However, for our convergent/discriminant validity tests, it offers an additional dimension to compare with our annotation scheme.
New lexicons developed using machine learning techniques such as unsupervised clustering overcome some of the issues of context by associating words in vector space. A context problem, as it might occur in a typical lexicon, is when a word’s meaning changes depending on its context. For instance, “kill” is in the list of words for “anger” in LIWC2015: however, there are different emotional connotations for “she killed her,” “she made a killing in the stock market,” and “that meme just killed me.” Jiang and Wilson (2018) created ComLex in part because they wanted a context-specific lexicon, and so developed 300 categories based on over 2 million user comments on social media (Facebook, Twitter, and YouTube). However, although the authors frequently refer to the clusters being linguistic signals of emotion, these categories are neither based on top-down theoretical constructs of emotions nor are intuitive lay categories of emotions. Instead, the dimensions are clusters of words and/or symbols that are then given a category label. The emotion lists that are mainly just emoji: the list for ‘funny’ includes a frog, alien, pizza slice, and beer; the one for ‘doubt’ has the emoji of not only a question mark and a shrug, but an eye, a female symbol, a male symbol, and emoji of a female face, male face, and baby/child face (see the ComLex at https://shanjiang.me/resources/ComLex.csv). In one of the less validated topic lists of words that includes more emotions, one has the words happy, glad, and excite, but also includes sure, sorry, sick, proud, tire, afraid, fearful, ashamed, lucky, confuse, jealous, and hop). Their lexicon was developed to examine fact-checked social media posts in particular but should not be taken as psychologically validated emotion lists.
Appendix 2: Emotion Categories (Version 3.32, January 2020)
Appendix 3: Example annotation
TOP: “Shame! Parliament members are building women's hell!” About today's demonstrations and reading lists of disgrace – members of parliament who voted against women's right to health and dignity – in today's Fakty TVN. BOTTOM: “Black protests” of the Razem Party. Thousands of people dressed in black against the abortion ban. TEXT ON SIGNS: Hands off women; Women’s hell continues; Freedom of choice instead of terror. EMOTION ANNOTATION: Anger (80), Excitement (50), Contempt (20), Hate (10), Fear (10).
TOP: Hungarians told immigrants NO! They are finishing building another border wall. The government has labeled illegal immigrants as criminals and is deporting them from the country. And yesterday, Orban announced changes to the constitution. Keep it up! BOTTOM: Orban: immigrants have one week to leave Hungary. TEXT ON IMAGE: You know what matters? Border protection! (rhyming) Keep it up! EMOTION ANNOTATION: Admiration (50), Hate (48), Anger (23), Amusement (25), Excitement (16).
Appendix 4: Correlation table p-values
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Paletz, S.B.F., Golonka, E.M., Pandža, N.B. et al. Social media emotions annotation guide (SMEmo): Development and initial validity. Behav Res (2023). https://doi.org/10.3758/s13428-023-02195-1
Accepted:
Published:
DOI: https://doi.org/10.3758/s13428-023-02195-1