Text Analysis of Evolving Emotions and Sentiments in COVID-19 Twitter Communication

Storey, Veda C.; O’Leary, Daniel E.

doi:10.1007/s12559-022-10025-3

Text Analysis of Evolving Emotions and Sentiments in COVID-19 Twitter Communication

Published: 28 July 2022

(2022)
Cite this article

Download PDF

Cognitive Computation Aims and scope Submit manuscript

Text Analysis of Evolving Emotions and Sentiments in COVID-19 Twitter Communication

Download PDF

3189 Accesses
10 Citations
1 Altmetric
Explore all metrics

Abstract

Scientists and regular citizens alike search for ways to manage the widespread effects of the COVID-19 pandemic. While scientists are busy in their labs, other citizens often turn to online sources to report their experiences and concerns and to seek and share knowledge of the virus. The text generated by those users in online social media platforms can provide valuable insights about evolving users’ opinions and attitudes. The objective of this research is to analyze text of such user disclosures to study human communication during a pandemic in four primary ways. First, we analyze Twitter tweet information, generated throughout the pandemic, to understand users’ communications concerning COVID-19 and how those communications have evolved during the pandemic. Second, we analyze linguistic sentiment concepts (analytic, authentic, clout, and tone concepts) in different Twitter settings (sentiment in tweets with pictures or no pictures and tweets versus retweets). Third, we investigate the relationship between Twitter tweets with additional forms of internet activity, namely, Google searches and Wikipedia page views. Finally, we create and use a dictionary of specific COVID-19-related concepts (e.g., symptom of lost taste) to assess how the use of those concepts in tweets are related to the spread of information and the resulting influence of Twitter users.

The analysis showed a surprisingly lack of emotion in the initial phases of the pandemic as people were information seeking. As time progressed, there were more expressions of sentiment, including anger. Further, tweets with and without pictures and/or video had statistically significant differences in text sentiment characteristics. Similarly, there were differences between the sentiment in tweets and retweets and tweets. We also found that Google and Wikipedia searches were predictive of sentiment in the tweets. Finally, a variable representing a dictionary of COVID-related concepts was statistically significant when related to users’ Twitter influence score and number of retweets, illustrating the general impact of COVID-19 on Twitter and human communication. Overall, the results provide insights into human communication as well as models of human internet and social media use. These findings could be useful for the management of global challenges beyond, or different from, a pandemic.

Sexist Slurs: Reinforcing Feminine Stereotypes Online

Article Open access 28 November 2019

Advances in Social Media Research: Past, Present and Future

Article Open access 06 November 2017

The disaster of misinformation: a review of research in social media

Article 15 February 2022

Introduction

The COVID-19 pandemic has affected, and continues to affect, lives worldwide in an unprecedented way. At the same time, the amount of information that has been generated during the pandemic is unprecedented. Social media users have created large amounts of publicly available communications that capture their views, opinions, concerns, thoughts, and knowledge about the pandemic. Our research investigates the text content of some of that social media, focusing on Twitter (posts) tweets, but also on Google and Wikipedia searches, to study human communications during the pandemic.

We use text analysis to extract and classify opinions and study how internet search data are predictive of Twitter tweet sentiment. Using both text analysis tools and manual assessment, Twitter tweets are analyzed for their content and expressions of sentiment and psychological content. We study how characteristics of the social media (e.g., pictures or no pictures) lead to different text concepts within Twitter messages. We investigate the relationship between Google and Wikipedia searches and the sentiment of Twitter messages creating a model relating search and social media. We also examine how a dictionary of key COVID concepts discussed in Twitter tweets are related to the extent that social media messages will get retweeted and add to the sender’s reputation in the context of the social media. Throughout, we use text analysis because it provides insight into the social media text provided by Twitter users [1, 2]. Accordingly, we use social media data, in the form of Twitter tweets and Google and Wikipedia searches, all of which were collected during the pandemic. Our specific objectives are to; analyze the sentiment and emotional content in Twitter tweet texts using both computer-based and manual methods; study the differences in text concepts identified in different types of Twitter tweets and retweets; develop a model of “search and tweet” and examine the ability of Google and Wikipedia searches to predict the sentiment in Twitter tweets; investigate the impact of the use of COVID-19 specific concepts (words) on human communication through Twitter; and identify general implications, beyond the COVID-19 pandemic, of the findings identified in this paper.

This research makes several contributions. First, the analysis shows that the text sentiment and emotional content of Twitter messages, with and without video and pictures, is statistically significantly different. Second, the text sentiment and emotional content in COVID-19-based tweets and retweets were statistically significantly different. Third, Google searches and Wikipedia page views are predictive of the percentage of positive and negative sentiment tweets. Fourth, a variable representing a dictionary of words capturing COVID-19 concepts was statistically significant in models of Twitter influence and retweets indicating the impact on human communication. In addition, we propose, as the basis for needed further analysis, a possible COVID-19 health management life cycle and further types of analysis that might provide useful with respect to different types of sentiment (e.g., neutrality, ambivalence).

The remainder of this paper proceeds as follows. The “Data and Methodology” section provides an overview of the data and the methods used. The “Twitter Text Analysis: Tweets vs. Retweets and Pictures vs. No Pictures” section discusses text analysis in Twitter tweets and investigates different levels of emotions in different types of tweets. The “Behavioral Links Between Internet Search and Twitter Sentiment” section examines the behavioral link between internet search, using Google and Wikipedia, and Twitter sentiment. The “Impact of COVID-19 Vocabulary Use of Twitter Reputation and Retweets” section considers the relationship of the impact of COVID-19-based communications on Twitter influence and Twitter retweets. The “Manual Analysis of Ad Hoc Tweets” section manually analyzes the tweets using a sentiment ontology. The “COVID-19 Management Life Cycle, Emerging Pandemic Issues, and Computational Extensions” section presents the notion of a COVID-19 life cycle and discusses additional approaches and extensions, such as Word2Vec, ensemble methods and notions of ambivalence. Finally, the “Summary, Contributions, and Conclusion” section summarizes the paper, discusses the implications of our findings and related research, and proposes further extensions.

Data and Methodology

In response to the pandemic, people sought to gain information about COVID-19 from personal interaction and sharing of stories and content. They shared their concerns, questions, opinions, and knowledge on social media [3]. Accordingly, we use such user provided content as the data for our research, especially Twitter tweets.

The Emotional Content of Twitter Tweets’ Text

Twitter tweets are often recognized as capturing the sentiment and emotional content of the crowd [4]. With millions of tweets generated daily, Twitter generally is perceived as a useful platform for research. Researchers have investigated many related issues using tweets, including tracking disease propagation, anticipating election results, and predicting sports outcomes [5]. Further recognizing the value of Twitter data, Banda et al. [6] created a large database of tweets related to COVID-19 and made it available to researchers, to provide substantial opportunities for investigation [7]. Other research has also used Twitter data for analyzing issues related to users’ attitudes towards COVID-19 [8,9,10].

Since people express their opinions and ideas in this user-generated content, this online text can be mined to extract corresponding sentiment in the textual disclosures [11]. There are different approaches to text analysis [12, 13]. Efforts related to text analysis related COVID-19 vaccinations have used a wide range of AI-enabled social media analysis on large data sets to accommodate the unstructured nature of the data [14]. As an example, Leibowitz et al. [15] used Linguistic Inquiry and Word Count (LIWC) to investigate the text of Twitter tweets generated by emergency medicine Twitter users and found that approximately 34% of the tweets were positive and 31% were negative; 76.5% focused on the present.

Review of COVID Data Sets and a Data Timeline

This research focuses on studying the early part of the pandemic. We, therefore, track some of the key events from early in the pandemic. Figure 1 shows a timeline of significant COVID-19-related events as the pandemic was identified and progressed during its initial stages when there were predictions that it might end by August 2020.

During the early phases of the COVID-19 pandemic, many people turned to Twitter as a platform to both post and retrieve information on the virus. Twitter tweets collected throughout this timeframe became the source of data for this research. We collected sets of data from Twitter, divided into roughly two periods: as the pandemic was identified and progressed to countries shutting down; and as countries started to reopen and cases continued to rise. These periods, were approximately, before early to mid-June, and after mid-June 2020.

LIWC

LIWC text analysis program [16, 17] was applied to the tweets. LIWC is perhaps the leading software for capturing information regarding psychological concepts from text. LIWC uses a psychology-based bag of words approach to analyze text. Tausczik and Pennebaker [18] provide a history of LIWC and the bag of words approach, which is derived from Freud and others and has a long history in psychology. Different concepts are represented within LIWC, such as “positive emotion” and “negative emotion,” but also related concepts such as “anger” and “power.” For each concept, a dictionary of words is included in LIWC. The software is then used to identify the relative frequency of occurrence of these words in a set of text (e.g., Twitter tweets), thus providing a scientific approach to text analysis. Representative concepts examined in this paper are summarized in Table 1.

Table 1 Summary of selected concepts and categories from LIWC

Text Analysis of Evolving Emotions and Sentiments in COVID-19 Twitter Communication

Abstract

Similar content being viewed by others

Sexist Slurs: Reinforcing Feminine Stereotypes Online

Advances in Social Media Research: Past, Present and Future

The disaster of misinformation: a review of research in social media

Introduction

Data and Methodology

The Emotional Content of Twitter Tweets’ Text

Review of COVID Data Sets and a Data Timeline

LIWC

Our Approach and Use of Data

Twitter Text Analysis: Tweets vs. Retweets and Pictures vs. No Pictures

Text Analysis of the Entire Sample of Tweets: Analytic, Clout, Authenticity, and Tone

Text Analysis of Tweets Using Pictures and Video Versus No Pictures and No Videos

Text Analysis of Retweets Versus Tweets

General Progression of Sentiment and Emotions Expression

Behavioral Links Between Internet Search and Twitter Sentiment

Impact of COVID-19 Vocabulary Use of Twitter Reputation and Retweets

Creating a Coronavirus Dictionary

Linguistic Inquiry and Word Count for Control Variables

Dependent Variables: Influence Score, Retweeted Status User Listed Count and “Is a Retweet”

Data

Empirical Analysis: Correlation and Regression Analysis

Empirical Findings

Correlation Analysis

Regression Analysis of Tweeter Influence Score

Analysis of “Retweeted Status User Listed Count”

Analysis of “Is Retweet”—Logistic Regression

Summary

Manual Analysis of Ad Hoc Tweets

Data

Searches for Nuggets

Twitter Use

Themes—Summary

COVID-19 Health Management Life Cycle, Emerging Pandemic Issues, and Computational Extensions

COVID-19 Health Management Life Cycle and Related Problems

Generating Bags of Words in Alternative Contexts for Alternative Problems

Ambivalence

Summary, Contributions, and Conclusion

Contributions—Human Communication

Relationship to Previous Research with LIWC

Conclusion

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Ethics Approval

Conflict of Interest

Additional information

Publisher's Note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation