Skip to main content

How does “A Bit of Everything American” state feel about COVID-19? A quantitative Twitter analysis of the pandemic in Ohio

Abstract

COVID-19 has proven itself to be one of the most important events of the last two centuries. This defining moment in our lives has created wide-ranging discussions in many segments of our societies, both politically and socially. Over time, the pandemic has been associated with many social and political topics, as well as sentiments and emotions. Twitter offers a platform to understand these effects. The primary objective of this study is to capture the awareness and sentiment about COVID-19-related issues and to find how they relate to the number of cases and deaths in a representative region of the United States. The study uses a unique dataset consisting of over 46 million tweets from over 91,000 users in 88 counties of the state of Ohio, a state-of-the-art deep learning model to measure and detect awareness and emotions. The data collected is analyzed using OLS regression and System-GMM dynamic panel. Findings indicate that the pandemic has drastically changed the perception of the Republican party in the society. Individual motivations are strongly influenced by ideological choices and this ultimately affects individual pandemic-related outcomes. The paper contributes to the literature by expanding the knowledge on COVID-19 (i), offering a representative result for the United States by focusing on an “average” state like Ohio (ii), and incorporating the sentiment and emotions into the calculation of awareness (iii).

Introduction

Coronavirus disease 2019 (COVID-19) was first identified in December 2019 in Wuhan, China, and has been declared to be a pandemic by WHO on March 11, 2020 (who.int). Speculatively, one of the most interesting aspects of COVID-19 has been that it reinforced some pre-existing divides between economically more and less developed societies; however, surprisingly, due to the lackluster performance of some of the developed nations against the pandemic, COVID-19 also created new winners. These expected and not-so-expected results make us wonder: Why? Why are some societies and regions more successful at fighting against COVID-19, whereas others are, politely speaking, struggling? The reasons might be geographic and physical endowment (human capital, geographical size, population density), structural (institutional quality, efficiency of healthcare), and also social. This paper looks at the latter aspect by analyzing the relationship between the situational awareness of people, the emotions that they reflect on Twitter, and the number of COVID-19 cases and deaths. Specifically, awareness and emotions are interpreted as both explanatory factors and results, whereas the number of cases and deaths in each county are viewed as the results. The study uses county-level data from Ohio, USA—a state that is socially and politically suitable to make generalizable conclusions about the United States. The analysis looks at over 46 million tweets collected from over 91,000 users from 88 counties in Ohio between the dates January 1 and April 30, 2020.

A main source of motivation for the study is that social media offers real-time content and therefore—when appropriately analyzed—can be a strong ally of the combat and prevention programs. Thus, Twitter can be used to analyze a representative sample of the population to contribute to health vigilance in local communities. The ultimate goal is to support the local governments or the health authorities that need to make health-related decisions.

Ohio has long been considered to be an “average” state not only socially and economically, but also ideologically. As The Economist once suggested, “This slice of the Midwest contains a bit of everything American—part north-eastern and part southern, part urban and part rural, part hardscrabble poverty and part booming suburb” (Economist, December 20, 2005). These characteristics give Ohio a unique position: its response to the COVID-19 crisis and its general social and ideological stance that is representative of the United States at large. Thus, results obtained from a socio-political analysis may have representative implications for the country. In fact, results obtained from this study suggest that COVID-19 may be a defining moment for the perception of Republicans with the “rally under the flag” effect in the initial phases of the pandemic having strongly lost its impact, as the pandemic has progressed.

The paper provides an overview of the literature on extreme events, the use of Twitter (and, specifically the hashtags) in social science research, connection between awareness and emotions, and health communication. The literature section is followed by research questions, introduction of the data, analysis, and results.

Literature on extreme events and pandemics

The high level of devastation caused by this unique extreme event motivated many scholars to do research on various aspects related to COVID-19. A quick search online indicates that there are more than 5,000 papers written about pandemic. Among the prominent examples, an eye-opening study by Cinelli et al. [16] looks at the spread of misinformation (usually called infodemics in the literature) from a comparative perspective, using Twitter, Instagram, YouTube, Reddit and Gab. Similarly, Gallotti et al. [27] analyze the reliability of the Twitter messages before and after the pandemic arrives in a specific country. Other examples include the usage of GIS technologies to support the global fight against outbreaks and epidemics [11].

Twitter, as a social media platform, is a valuable source to gauge the dynamics in a society. As Bourdieu once suggested, linguistic marketplace [12] is a platform where people provide information about themselves for social gain, and, a modern interpretation of this concept would be Twitter. The social gain can be individualistic or collectivistic. In addition to providing a fairly accurate picture of the self, Twitter also precisely reflects the social and economic hierarchies which exist in offline contexts [52]. This paper stands at the intersection of epidemics and social media research, and hence aims to contribute to the literature by focusing on the most recent global pandemic, sars-cov2.

Previous experience shows that extreme events and epidemics provide a unique opportunity to be scientifically productive. A valuable study by Tang et al. [63] provides a systematic review of the literature on social media and outbreaks of emerging infectious diseases by concentrating on H1N1 and Ebola. The authors find that studies suffer from a lack of theorization and need more methodological rigor. Among the 43 papers under focus, 16 papers use Twitter as the source of data. Other social media research examples from the literature on extreme events include 2011 London riots [20], 2012 hurricane Sandy [34], and the 2013 European floods [57]. Despite the fact that pandemics are quite rare compared to other extreme events, there are a few studies that use social media to study pandemics as well. One example by Fung et al. [25] examines the amplified fear of the imported Ebola virus in the USA through Twitter. Another example looks at real-time classification of Twitter streams during epidemics [39].

Other related studies include Missier et al. [45] focus on Twitter and the Zika epidemic by training a classifier to discover the top users who are actively posting relevant content about the topic. Further examples look at the adoption and utilization of social media during extreme events [22, 35, 42, 48]. Like many other examples [29, 47], this study believes that social media is a very useful source of information during extreme events. Most importantly, social media helps users develop situational awareness during an extreme event and gain a higher resilience [56].

On the whole, a goal of this study is to bring several fields of research together by forming an interdisciplinary framework: There are very few studies that bring an analysis of communication structure, emotion classification and social network analysis in an extreme event together in a single study; one very rare example looks at three terrorism events from the 2010s in the Western world [62].

Literature on NLP, network analysis, and Twitter methodology

Measuring awareness forms an important component of this paper. The definition of awareness may be subject to interpretation; nevertheless, for this paper, it is used as the “intensity of discussion on a topic”. Language-based methods have long been used to detect awareness. In the literature, there are two macro-approaches to measure awareness: dictionary-based and lexicon-based approaches [6]. The first category starts with a small set of opinion words and expands the lexicon through bootstrapping while the second category generates the opinion lexicon through learning the dataset (ibid). This paper uses a lexicon-based approach: the words used to detect awareness have all been extracted from the data at hand. The reason behind this choice was that the expectation that COVID-19 may have drastically changed the language/hashtag landscape because of its devastating social effects and hence a generalizable lexicon would not be helpful (i), and in this way to identify associated with COVID-19 that are not as strongly represented in some other context (for example, the importance of exercising at home).

The second important methodological component that is more widely studied in the literature is the use of hashtags. Hashtags fulfill different roles on Twitter. One of the functions of hashtags is contextually marking conversations around a certain topic [13] and awareness can easily be associated with being part of a conversation. Hashtags are also frequently used for “social tagging” [43], thus, they fulfill the function of motivating others and making them aware of a topic through self-awareness. Huang et al. [33] suggest that hashtags can be organizational (hence, used for organizing resources) or conversational and serve for transmitting a message. Similarly, an important feature of hashtags is that they are monolingual and do not translate into other languages, thus, they allow for the association of tweets written in different languages [19].

Hashtags have been used frequently in Twitter-related studies due to their clear message and short one-word structure that do not leave much room for contextual interpretations and technical ambiguities for representation. Thus, they are convenient tools for operationalizing awareness. For that matter, in terms of applications, hashtags are created and used extensively in the context of social protests and thus studied from that perspective [5, 55, 64], as a measurement of ideology [4].

The methodological importance of co-occurring hashtags has been noticed by Twitter researchers. Co-occurring hashtags are those that appear among other hashtags in a single tweet and in a collection of tweets (such as among all tweets posted by the same Twitter user). A theoretical discussion on the topic has been proposed by Jan Pöschko [54] where he classifies hashtags into groups both using the linguistic properties of the hashtag and social/geographic variables incorporated into a tweet. Relatedness of hashtags is the core operation which concern hashtags for clustering, classification or recommendation [44, 51, 65]. The significance of hashtags for this study is that they provide an opportunity to detect a range of topics that appear alongside some “core hashtags” that can be used as an alternative tag for the pandemic. These core hashtags have been extracted from various non-academic sources online and are believed to be the most frequently used hashtags in the context of the disease, similarly, they have been qualitatively checked by the author in terms of how well they represent the situation. According to the classification of hashtags provided by Recuero et al. (2012, selected core hashtags are almost always referential; thus, they address the context of what was happening. In addition, Recuero et al. (ibid.) also mention expressive, conative, metalingual, poetic, and phatic hashtags. In most cases, these hashtags have been ignored in this study, since they do not provide any contextual evidence for the range of practical topics that people in Ohio associate with COVID-19.

Other researchers point at the importance of hashtag pairs. In their study on protests in Brazil, Recuero et al. [55] have paid particular attention to the grouping of hashtags and further classified the co-occurring hashtags (individual types of which were mentioned above) in pairs. They believe that co-occurring hashtags may be used to mobilize or motivate people, localize the context, make demands or reinforce an opinion, characterize information or provide general context. Among the most important groups of hashtags, conative-conative hashtag pairs are used to mobilize people through strengthening the meaning of the imperative mood. In this paper, this reinforcement has been translated as “awareness about the pandemic” in general. Similarly, another important hashtag pair provided by Recuero et al. (ibid.) is the reference-reference group. In their paper, this group is mostly associated with the localization of tweets—again interpreted as “domestic awareness” and “nationalistic awareness” in this study. Comparably, emotive tweets have been used to project demands and opinions (about political parties) and used to reflect partisanship or political dissent. Lastly, metalingual hashtags provide information about the situation or they are used to characterize it. Co-occurrences of metalingual hashtags with core hashtags have been used to determine the level of awareness about the social aspects of life related to COVID-19, economic repercussions, ways of entertainment during the pandemic, and sports in general. A similar classification has been brought forward by Shapp [61] where the author differentiates between “tag” hashtags used to designate certain events and “commentary” hashtags that are used to add additional meaning to the main semantic content of the tweet. The “core hashtags” and the “topic hashtags” distinction brought forward in this paper is very similar to the categorization proposed by Shapp.

Awareness and emotions

The connection between awareness and emotions has previously been studied by other scholars. In fact, social media communication systems have been referred to as “social awareness streams” [49]. Thus, they are platforms to indicate awareness about issues important to the users. There is also evidence that these awareness networks are being used to share emotions [, 54]. Sharing of emotions strengthens ties, brings users closer to one another and allows new ties in the network to form [54]. Similarly, the “emotional broadcaster theory” (EBT) indicates that people are highly motivated to tell others what they think about major events, and the posts with the expression of emotion might contain information relevant to listeners [28, 31]. An example that studies the relationship between the sharing of emotions and the properties of social networks is by Kivran-Swaine and Naaman [41], in which the authors investigate the intensity of social media use and the people’s tendency to express emotions.

Psychologists further strengthened the connection between awareness and emotions by conceptualizing ‘emotion awareness’ [58]. According to Rieffe et al., emotion awareness is an attentional process in which a person monitors and differentiates between discrete emotions and identifies their elicitors. Emotion is projected simultaneously with or as a response to an awareness situation. Thus, an emotional projection consists of a set of components (interpretation of the situation, physiological changes, tendency for action, motor reaction, subjective experience) [60]. In this regard, this article argues that the emotions projected by Twitter users are tightly connected to different issues they are aware of. It is further believed that the intensity of awareness on a topic is strengthened cyclically when other people are aware of that topic, as well. Greater awareness results in a greater intensity in the projection of emotions, and possibly a clearer distinction between different forms of emotions.

Communicating health through social media

Traditional media outlets such as magazines, newspapers, and TV channels have been losing their communicative strength and their power to set discourse over the last few years. The reason behind this phenomenon is the rise of digital media channels (including social media outlets which work as alternative news sources), and the need to communicate news in a much timelier manner. The opportunity offered by social media channels to disseminate news much more punctually than traditional media outlets led to a decrease in the popularity of the latter one.

Social media offers an opportunity to communicate news on many topics including health. Researchers indicate that the use of social media and the better communication of health-related news can lead to better health outcomes. Over time, there has been significant increase in the number of people using social media to provide or seek information on health, share personal experience regarding diseases, medical treatments and medications, and to communicate with experts of healthcare [15, 18, 21, 26, 38]. A systematic review offered by Moorhead et al. [46] identified six advantages of using social media for health communication: “(1) increased interaction with others, (2) more available, shared, and tailored information, (3) increased accessibility and widening access to health information, (4) peer/social/emotional support, (5) public health surveillance, and (6) potential to influence health policy.”

Another benefit of using social media data can be observed in public health surveillance [46]. Traditionally, public health surveillance relied on the flow of data coming from healthcare providers and pharmacists [66]. Public health data can be used to identify populations suffering from a particular illness (especially important during a pandemic like Covid-19), observe the infection patterns of a disease, and identify events related to medications, vaccinations, and other uses of drugs (ibid.).

Other researchers looked at the use of social media for health communication by different demographic groups: younger people use social media disproportionately more for disseminating health-related information, but racial/ethnic disparities cannot be found [36]. Kite et al. [40] analyzed the types of messages shared by public health organizations that the public finds the most engaging.

Research questions

An initial examination of the data shows that there is considerably high county-level variation both temporally and across regions in the number of cases and deaths related to COVID-19 in Ohio. The social aspects contributing to the spread or the prevention of the pandemics and the power of social media to measure social dynamics in a context where conventional ways of data collection—such as surveys—have become obsolete provide a unique opportunity for getting help from Twitter. In addition, the sharp interventions introduced by the governors of states to curb down the pandemic offer a unique opportunity to analyze the situation in a quasi-experimental setting. In this context, an interesting research question may be to examine how different levels of awareness and different levels of sentiments about COVID-19-related issues may influence the number of cases and deaths in a county.

The methodological setup of this paper has been structured according to the policies implemented by Governor DeWine. It is assumed and expected that there will be a contagion effect between some of these policies (a list of policies has been given in “Data”). Also, some of the policies will provide a higher incentive for self-isolation and create greater cognizance about the pandemic in general. To minimize these effects and to make the results easily interpretable, this paper offers a twofold analysis by looking at the days before the first case and the days after post-stay-at-home order.

With this background in mind, the paper aims to fill several geographical, substantive and analytical gaps in the literature. Talking about potential substantive contributions, COVID-19 is still in an early stage, and especially in countries heavily affected by the pandemic, a social study organized in a region that is highly representative of the United States could be of value for the scholars and policymakers worldwide. In similar regard, this paper aims to extend the regional policy literature. Methodologically, the paper expands the NLP literature on awareness and text similarity by bringing in the sentiment and emotion component to the picture—an issue that has largely been neglected so far in the literature. Lastly, the study offers a regionally focused unique dataset compiled in the early stages of the pandemic and offers an opportunity for new research in the field.

This paper answers four related research questions using levels of awareness and emotions extracted from Twitter:

  1. Q1.

    Is pre-first-case awareness about COVID-19 associated with the number of post-stay-at-home-order cases and deaths?

  2. Q2.

    Is post-stay-at-home-order awareness about COVID-19 after the stay-at-home order associated with the number of cases and deaths?

  3. Q3.

    Is being in a certain pre-first-case mood related to the number of post-stay-at-home-order cases and deaths?

  4. Q4.

    Is being in a certain post-stay-at-home-order mood related to the number of cases?

As one can see, there are actually two main questions being investigated with two different datasets. The first expectation is that Ohioans have started to form an awareness about COVID-19 before the pandemic arrived in the continental US, and the awareness they formed has an impact on the number of cases and deaths that have been experienced after the stay-at-home order. The second expectation is that—due to endogeneity—it is difficult to determine the relationship between social awareness/mood and the number of cases during the policy implementation phase and in the pre-policy period due to a low number of cases. The post-policy period offers a better opportunity to analyze those dynamics. The methodological aspects of these choices have been explained in more detail in the next section.

Data

The data for the project has been obtained in several steps and from several sources. To collect a representative sample of users residing in Ohio, tweet activity of the users who have self-identified themselves to be living in geographical coordinates of the state has been live-streamed. The collection has been done using the rtweet module of R statistical language (rtweet.info) for around three days in the month of March 2020. This resulted in the identification of 177,351 users with addresses in Ohio. Among those users, 129,815 have been observed to have self-identified themselves at a county-level address. Their formal addresses have been identified using the Nominatim module that uses OpenStreetMap data as the source of information (openstreetmap.org). The BotOrNot algorithm [17] implemented through the Botometer module of Python language has analyzed those users and finally obtained the results for 105,618 of them. After this pre-processing step and the merging of datasets, 91,096 users have been found. The identification of users who used hashtags in their tweets led to a further elimination of some users, and, thus, the final number of users in the sample became 48,291.

After the identification of users, their tweet histories have been downloaded using the Tweepy module of the Python language (tweepy.org), which provided a total of 188,577,209 tweets. Since the pandemic has entered our lives on the very last day of the year 2019, all tweets dated 2019 or earlier have been discarded from the analysis. This provided 46,078,750 tweets. In addition, discarding the retweets, and tweets with no hashtags, 16,753,733 were recorded. The sample of tweets with hashtags that was obtained is biased towards urban centers to some degree; nevertheless, the proportion of tweets coming from different counties is similar to the distribution of the population within the state of Ohio. A comparison of populations and the sample sizes have been provided in the county-level maps (Fig. 1).

Fig. 1
figure 1

A Comparison of sample size and population

The county-level data about the number of cases (positive for those who are infected or negative for those who recovered) and number of deaths due to COVID-19 have been obtained from the Ohio Department of Health (https://coronavirus.ohio.gov/). The data collection starts on January 1, 2020, and ends at the end of April 2020.

The examples below show the variation in number of cases and deaths from different Ohio counties in the crisis. The ridgeline graph (Fig. 2) shows the distribution of cases per person in five different urban centers in Ohio ranked from cities in better condition to the worse. Dayton has the best performance since it has the highest number of cases per person that are zero. Contrastingly, Toledo seems to have the worst performance in terms of preventing the spread of the virus.

Fig. 2
figure 2

Distribution of total cases per capita/day in major areas of population

The variation of the number of cases across counties can be found in the graph (Fig. 3) that compares the growth rate of the pandemic to the number of cases per person in each county. The numbers indicate that Marion county (a mostly rural county north of Columbus, Ohio) and Pickaway County (a semi-rural county to the south of Columbus, Ohio) have the worst performance in terms of fighting against the pandemic.

Fig. 3
figure 3

Growth rate and number of cases per capita

As previously noted, the methodological setup of this paper is based on the COVID-19-related policy implementations of Governor of Ohio, Mike DeWine. Over the course of the pandemic (the start date of which can be accepted as January 1, 2020), DeWine implemented seven state-wide policies, the most important of which is the last one, the stay-at-home order (KFF.org). An overview of the policies can be found in Table 1.

Table 1 COVID-19 policies implemented by Governor Mike DeWine

Methodology

The methodological choices made for the paper can be grouped under three sections: measurement of awareness and extraction of emotions from the Twitter data (i), reshaping of the data for analytical purposes (ii), model specification and statistical assumptions (iii). The data-specific and methodological pipeline used in the paper can be seen in Fig. 4.

Fig. 4
figure 4

Data pipeline

Following the data collection stage, the initial step was the identification of “core hashtags” to detect COVID-19-related discussion on social media. These hashtags have been collected from various informal web sites online that provide hashtag tracking services. In total, 18 web sites have been analyzed to form a collection of possibly core hashtags. Some examples include #corona, #virus, #influenza, #covid19, #wuhanvirus. The complete list of 94 core hashtags can be found in the Appendix. The tweets that include at least one core hashtag have been used to extract co-occurring hashtags, which ultimately resulted in the identification of COVID-19-related topics. To identify the “trending topics”, co-occurring hashtags with a count of 100 or above have been identified (this gave a total of 3252 hashtags), and each hashtag has been manually assigned to a topic. For the detection of topics, a few other unsupervised clustering options have been considered, such as Latent Dirichlet Allocation [8], Non-Negative Matrix Factorization [50], Louvain-Modularity based clustering on co-occurring hashtag networks [9], however, none of them gave results as accurate as the hand-coded topic identification. Some hashtags among those (such as #life, #today, #goals cannot be clearly associated with a “general topic” and therefore have been ignored. This provided 1951 hashtags that were used to create 20 different groups of awareness. Awareness topics that have been identified are the following: Social, Nationalistic, Entertainment, Sports, Race, Economy, Foreign, Religion, Health Technology, Democrats—Love, Democrats—Hate, Republicans—Love, Republicans—Hate, Domestic, Illness (a greater variety of hashtags about COVID-19, Ideology, Education, and Gender. A full list of the hashtags and associated topics can be found in the attachment to the paper.

On a related note, this paper uses a “supervised” bag-of-words approach to classify hashtag topics on a continuous scale. To measure the awareness levels, a normalized version of cosine similarity (that is typically used to measure text similarity—for an example see Bird et al. [7]) and Jaccard similarity (as a validation check) has been used. Thus, hashtags extracted for topic and co-occurring hashtags extracted from tweets have been compared. The formulas for cosine and Jaccard similarity can be found below.

$$V1:{\text{Vector of hashtags in the tweet}}$$
$$V2:{\text{Vector of hashtags in the bag of topic words}}$$
$$S1:{\text{Set of hashtags in the tweet}}$$
$$S2:{\text{Set of hashtags in the bag of topic words}}$$
$${\text{Normalized cosine similarit}}y = \frac{{\max \left( {{\text{cos}}. {\text{similarity}}} \right) - \frac{V1*V2}{{V1* V2}}}}{{\max \left( {{\text{cos}}. {\text{similarity}}} \right) - {\text{min}}\left( {{\text{cos}}. {\text{similarity}}} \right)}}$$
$${\text{Normalized Jaccard similarity}} = \frac{{\max \left( {{\text{Jac}}{\text{. similarity}}} \right) - \frac{{\left| {S1 \cap S2} \right|}}{{\left| {S1 \cup S2} \right|}}}}{{\max \left( {{\text{Jac}}{\text{. similarity}}} \right) - {\text{min}}\left( {{\text{Jac}}{\text{. similarity}}} \right)}}.$$

To better understand the type of data used (Fig. 5) you can see a network of co-occurring hashtags created based on Zipf’s Law.Footnote 1

Fig. 5
figure 5

Co-occurring hashtags network

In the literature, there is criticism about using similarity scores to calculate awareness levels on their own. Budanitsky and Hirst [14] introduce the concept of semantic relatedness, indicating that two syntactically irrelevant words (such as hot and cold or car and wheel) may be used together to underline the same meaning. To solve this problem, scholars have identified “sophisticated” similarity measures. Examples include CosText [51] that looks at the cosine similarity of tweets containing the hashtags in question rather than looking at the similarity between groups of hashtags, a labeled LDA model where hashtags are used as labels for the tweets in comparison [44], CosEntity [24] that constructs a bipartite graph between hashtags and entities in a tweet and Top-k Relatedness (ibid.) that looks at the relatedness of the entities previously mentioned.

As previously mentioned and accepted by other works in the literature [24], tweets are very short and noisy and generally contain very few hashtags. Although not believed to be systematic in nature, this creates several limitations for this paper: a considerable percentage of the data is lost to extract tweets that can be analyzed through hashtags (i), and sentiment analysis and emotion classification perform likely less well than in a controlled study (ii).

Based on the discussion in the paragraphs above, this paper uses a combination of more traditional types of comparison (such as cosine and Jaccard similarity) along with sentiment analysis and emotion classification because of a few reasons. First, using data from a limited geography and a limited timeframe helps to avoid contextual confusions resulting from temporal characteristics. Second, despite the fact that unsupervised topic detection and awareness calculation methods work fine, manually labeling around 2000 hashtags is believed to provide higher levels of precision regarding topic detection. Third, non-continuous, classification-based methods lead to a great data loss, since they want to associate each hashtag/tweet with a single label and therefore do not leave any room for overlapping classification or continuous measurements. Lastly, mathematically more involved similarity measures are harder to interpret and traditional methods offer a more intuitive explanation of the effect size.

As noted, the contribution of the ‘awareness calculation’ in this paper to the literature is that it controls for sentiments and emotions in a statement. Thus, the second component of the data collection process for explanatory variables has been performed using an “emotion classifier” trained on tweets. The main reasoning behind this choice was to add a sentiment dimension to the awareness calculation. In other words, since awareness about an issue is operationalized through psychological behavior, sentiments are thought to be a nuanced contribution to understanding which social metrics are particularly successful at fighting the pandemic.

The emotion model classifies five different emotions: sadness, happiness, anger, hate and neutrality. The fact that five emotions have been chosen relies on two criteria: availability of labeled emotion datasets (i), and the ongoing discussion on the definition of “basic emotions” that has been started by Paul Ekman in the early 1990s [23]. Ekman originally proposed seven basic emotions, however, this number declined over years, as the expressions for some emotions are more similar to each other than they are to others. Notable works that provide a commendable summary of this debate include Jack et al. [37], Gu et al. [30] and others.

To classify the emotions, a series of deep learning algorithms have been used. An overview of the model can be seen in the pipeline diagram (Fig. 6). To predict the emotions in the tweets, pre-trained GloVe vectors have been trained on 2 billion tweets (Pennington et al., 2014). As the accuracy table indicates, the model performs with an average accuracy of 71% (Table 2). Modern algorithms for multinomial emotion classification from related papers have accuracy levels ranging from 61.63 to 85% [32] (Table 3). Hence, the performance achieved is moderately high, and acceptable for the data at hand.

Fig. 6
figure 6

Emotion classification pipeline

Table 2 Accuracy table for emotion classification
Table 3 Examples for emotion classification from COVID-19-related tweets

Following awareness measurement and emotion classification, the data have been shaped to answer two sets of questions indicated in the hypotheses section. The linear regression model looks at the relationship between pre-first case awareness and emotions and post-stay-at-home-order number of cases and deaths. For the pre-first case part, January 1—March 9 period has been chosen. The post-stay-at-home order period covers the dates March 23—April 30. The sixteen days between two periods have been ignored since—as previously mentioned—due to the implementation of many policies in that time frame—there is policy contagion and simultaneity between the number of cases and awareness/emotion levels.

In the pre-first-case dataset, each county i is represented as a single, aggregated observation: a set of explanatory variables with average awareness and emotion ratio levels collected from pre-first-case observations and average cases and deaths after the stay-at-home policy has been implemented. The post-stay-at-home-order dataset that is used to analyze the relationship between post-stay-at-home-order awareness and emotion dynamics and the number of cases/deaths has been configured in a panel format. Thus, each observation is the average of values collected from one of the 88 counties in Ohio on a specific day (there are 119 days in total).

For the first dataset, a pooled-OLS has been used for the empirical analysis, since it is believed that cross-state disease spread, as well as the policy contagion, is non-existent in that period. For the second dataset, a dynamic panel estimator has been used to solve the cross-time contagion problem regarding cases and deaths, and two panel estimators have been considered: System-GMM [2] and Difference-GMM [1]. System-GMM has been favored over Difference-GMM, since the latter is believed to have poor finite sample properties (in other words, bias) in cases when the series are highly persistent [10]. In fact, the dataset at hand represents an example where regressors are quite strongly persistent, as the number of cases/deaths from a previous day point can be a very accurate sign of what the results of the pandemic will be on the next day. Finally, Akaike and Bayesian Information Criteria have been calculated to determine the optimal number of lagged variables. The results for AR(1 and AR(2 processes were quite close, and, ultimately AR(1 has been selected to obtain better interpretability of the results. In the end, four different empirical models were created.Footnote 2 The formulas have been provided below.

$$\begin{gathered} \left( {\text{i}} \right)P{\text{L Average Cases or Deaths}}_{i} = \beta_{0} + \beta_{1} *{\text{Total county population}}_{i} \hfill \\ + \beta_{2:5} * {\text{PF Average Emotion Ratio}}_{i} + \varepsilon_{i} \hfill \\ \end{gathered}$$
$$\begin{gathered} \left( {{\text{ii}}} \right){\text{PL Average Cases or Deaths}}_{i} \hfill \\ = \beta_{0} + \beta_{1} *T{\text{otal county population}}_{i} + \beta_{2:20} * {\text{PF Average Awareness Score}}_{i} \hfill \\ + \beta_{3} *{\text{PF Average Positivity Score}}_{i} + \beta_{4} *{\text{PF Average Negativity Score}}_{i} + \varepsilon_{i} \hfill \\ \end{gathered}$$
$$\begin{gathered} \left( {{{\text{iii}}}} \right) PL Daily Cases or Deaths_{i,t} \hfill \\ = \eta_{i} + \chi_{t} + \phi_{1} *PL Daily Cases or Deaths_{i,t - 1} + \phi_{2:5} *PL Average Emotion Ratio_{i,t - 1} + \varepsilon_{i,t} \hfill \\ \end{gathered}$$
$$\begin{gathered} \left( {{\text{iv}}} \right){\text{PL Daily Cases or Deaths}}_{i,t} \hfill \\ = \eta_{i} + \chi_{t} + \phi_{1} *{\text{PL Daily Cases of Deaths}}_{i,t - 1} + \phi_{2:20} *{\text{Average Daily Awareness Score}} _{i,t} \hfill \\ + \phi_{21:39} *{\text{PL Average Daily Awareness Score}}_{i,t - 1} + \phi_{40} *{\text{PL Daily Positivity Score}}_{i,t} + \phi_{41} \hfill \\ *{\text{PL Daily Positivity Score}}_{i,t - 1} + \phi_{42} *{\text{PL Daily Negativity Score}}_{i,t} + \phi_{43} \hfill \\ * {\text{PL Daily Negativity Score}}_{i,t - 1} + \varepsilon_{i,t} , \hfill \\ \end{gathered}$$

i: County (one of the 88 counties in Ohio), t: Day (one of 119 days in the analysis window), PF: Pre-first case (January 1–March 9 window), PL: Post-lockdown (March 23–April 30 window), (i) and (ii): linear regression models looking at the relationship between PF awareness and emotions and PL average number of cases and deaths, (iii) and (iv): System-GMM dynamic panel model looking at the relationship between PL awareness and emotions and PL daily number of cases and deaths.Footnote 3

Despite the careful selection of models and the effort to obtain highly representative data, a few drawbacks resulting from the chosen datasets and design need to be mentioned: some counties (very few in number) have not had any cases or deaths for some portion of the post-stay-at-home period—which decreases the variation in the dependent variable (nevertheless, this does not prevent the invertibility of the Hessian). In addition, as previously indicated, neither the emotion classification model nor (expectedly) the calculation of the awareness scores provides a perfect operationalization of the concepts at hand.

Descriptive findings

A first descriptive look at the data suggests that it is difficult to clearly measure the impact of different policies implemented by Governor DeWine. In fact, most of the policies have been implemented in a period when Ohio did not have many cases. As seen in Fig. 7, different policies shown with vertical lines do not correspond to a lagging effect of decrease in cases and/or deaths. Thus, as highlighted in the methodological setup, it is theoretically more useful to look at factors that are associated with COVID-19 controlling for the policy effects.Footnote 4

Fig. 7
figure 7

Cases and deaths in Ohio

The distribution of the explanatory variables, on the other hand, is shown in the graphs in Figs. 8 and 9 that demonstrate the variation in awareness and emotions over time. Figure 8 shows the average amount of normalized cosine similarity aggregated across different topics. Figure 9 shows the ratio of emotions on a given day. With regard to the awareness, it is evident that people worried much more about the politics and much less about the economy before the state-wide policies have been implemented. Similarly, Ohioans chatted the most about the pandemic just before the number of cases and deaths reached a peak number. In addition, there was a considerable amount of discussion about the pandemic before the first case in Ohio. This is in accordance with the expectations: Before the economic impact of COVID-19 was felt by the society, there was a lot of political discourse about how to best deal with the pandemic. After the impact hit Ohio quite strongly, the number of cases surged, many businesses closed, and the structure of the participation in the workforce changed considerably; this may have led to a stronger discussion about the economic aspects of the situation. Looking at the distribution of emotions over time in Fig. 9, the variation is not as great; however, one can notice that during the March 9—March 23 period when the state-wide policies were quickly introduced, the ratio of tweets classified as “sad” has slightly increased.

Fig. 8
figure 8

Awareness about different topics over time

Fig. 9
figure 9

Emotion distribution over time

In summary, particularly for the case of awareness scores, the data at hand shows great promise for empirical analysis: there is variation across counties, and there is also variation across time. Similarly, as evident from the data, state-wide policies may be one factor behind the change in awareness scores and the variation in emotions. Lastly, it makes sense to look at awareness and emotions separately, since—if there is any—the effect of policies is comparably lower on emotions than it is on the set of awareness scores. The emotional content of the tweets may vary less than the awareness since most COVID-19-related tweets contain a greater amount of factual (or non-factual) information, but less emotional content. This is believed to be a result of a complex set of causes including culture and the demographic and socio-economic backgrounds of the Twitter users in Ohio.

Empirical results

Empirical findings for the study have been provided in Tables 4, 5, 6, 7 and 8. To show the effect sizes for different regressors, a heatmap is used. The results obtained from the four models offer us a story that can best be explained by the existing ideological divides within the American society. Nevertheless, controlling for the number of cases and deaths, the most striking reason that makes the findings interesting is that there seems to be a considerable amount of shift in the level of awareness and emotions reflected by the group of people affected by COVID-19 when we compare the pre-first case and the post-stay-at-home datasets.

Table 4 Pre-first case awareness (X) and post-stay-at-home cases and deaths (Y)
Table 5 Pre-first case emotions (X) and post-stay-at-home cases and deaths (Y)
Table 6 Post-stay-at-home awareness (X) and post-stay-at-home cases (Y)
Table 7 Post-stay-at-home awareness (X) and post-stay-at-home deaths (Y)
Table 8 Post-stay-at-home emotions (X) and post-stay-at-home cases and deaths (Y)

Looking at the pooled-OLS model that reports on the association between pre-first case awareness scores and emotions (X) and post-stay-at-home number of cases and deaths (Y), one can see that people who are opposed to the Republican symbols and ideology (Republicans—Hate) have experienced a lower number of cases. In addition, people who frequently talk about COVID-19 have experienced a smaller number of deaths, and people who discuss sports had a higher number of deaths on average. Among all significant variables, Republicans—Hate stands out with its great effect size. This result can best be explained with the general consensus established in the previous months that more liberal segments of the population are more sensitive to the protection against the disease and the prevention of its spread (given that Ohio is a “swing state” that always has fierce electoral competition between Republicans and Democrats (Pew Research Center, June 25, 2020). The reason why “sadder” segments of the society have experienced lower numbers of cases is less clear; nevertheless, this result can still be tied to the stereotypical Democratic vs. Republican interpretation of the pandemic. It is quite likely that parts of the population that are more empathetic to people affected by COVID-19 have developed a grimmer outlook in the earlier phases of the pandemic.

The System-GMM models used for the post-stay-at-home dataset offer a different story. In this case, statistically, all awareness scores and emotion ratios (X) are significantly correlated with the number of cases and deaths (Y) controlling for the lagged independent and dependent variables, as well as the sentiment component of the tweets. Ranked by the effect size, awareness about health technology (i), domestic issues (ii), and opposition against the Republican symbols and ideology (iii) are the top three regressors significantly positively correlated with cases and number of deaths. Contrastingly, awareness about foreign aspects of the pandemic (i), support for the Republican Party and symbols (ii) and awareness about social and nationalistic aspects of COVID-19 (tied—iii) are significantly negatively correlated with the number of cases and deaths. Looking at the emotions, ‘being happy’ is associated with having a fewer number of cases and deaths.

A comparison between two groups of results shows contrasting aspects possibly hinting at the fact that COVID-19 may have changed social and political perceptions in the population. As expected, people who tweet about possible ways of ending the pandemic are those who have experienced the pandemic in their close communities. Thus, they are associated with more cases and deaths. More interestingly, however, counties that overwhelmingly oppose Republicans are associated with a higher number of cases and deaths in the post-stay-at-home period. This is likely due to the comparably poor response of the United States to the pandemic (Foreign Policy, April 1, 2020) and reflects the shift in the approval rating of the government. However, even more interestingly, people who show overwhelming support for the government are associated with a lower number of cases and deaths. And, again, this is likely a result of the “rally around the flag effect” [3] as evidenced by the close to 10% increase in the approval rating of President Trump in the initial phases of the pandemic (Gallup.com). Thus, some people in the sample withdrew their support and they have been replaced with others. Also, the findings suggest that people with a more global awareness about the pandemic and who also care about their country in a nationalistic way are associated with lower number of cases and deaths. Also, expectedly, counties less heavily affected in the lockdown period feel happier on average.

Conclusion and policy implications

This paper investigates the relationship between awareness and sentiment of the people in a region that is highly representative of the country of the United States and the effects of COVID-19 on its people. The most important finding is that COVID-19 as a process has changed the awareness and social perceptions of people on COVID-19-related issues as the pandemic has progressed. Specifically, segments of the society that are least hardly hit by COVID-19 were associated with opposition to Republican symbols in the initial phases of the pandemic; the same group is associated with a higher number of cases and deaths during the peak phase. My explanation for this shift is that the “rally under the flag” effect has been replaced with the perceived lackluster performance of the government when the effects of the pandemic became more serious. Additionally, another important finding is that a global perspective on the issue seems to be correlated with better COVID-19 outcomes.

The more important question that is more difficult to answer is: Can or should policymakers and/or innovators react to these findings? The answer is, probably, yes. As the paper is yet another suggestion that America is politically divided, based on the results, policymakers can benefit from focusing on two different strategies. First, policymakers should react timely to new developments and, therefore, not wait for a politically or populistically motivated response to grow. If factual information is brought forward punctually, the public will have more time to analyze and deliberate about the results and will likely more critically evaluate political and populistic statements by politicians. Second, results indicate that certain COVID-19-related topics, such as social and entertainment, are associated with higher cases and deaths. It is a human need to be in close proximity with others and socialize, and this need will grow even larger in an even worse health crisis requiring further isolation. Thus, the second goal should be to devise innovative policies to satisfy social needs.

These findings also contribute to our understanding of the current global health crisis and its likely consequences. First, people’s relationship with their government seems to be a good indicator of how successfully they can deal with an extreme event. Second, the findings reinforce the idea that crisis situations reshuffle the perceptions in a society and can have political consequences for the government. To the extent that voters share this assessment, governments with poor COVID-19-related outcomes may weaken in the coming years especially in prolonged crisis situations; this is important to keep in mind for populist governments that have performed quite poorly in the pandemic (New York Times, June 2, 2020). This study has implications for policymakers, as well: party ideologies will likely be formed by even greater ideological divides and greater gap between each other in terms of technical aspects. Political differences will grow if outcomes continue to be difficult to measure objectively or they become clear only in the long term.

Notes

  1. After the number of co-occurring hashtags have been ranked from the highest to the lowest, the number of hashtags to be included was chosen based on the point where the slope change for the count ranking is the greatest for easier visualization. This gave a total of 647 hashtags. The law is named after the American linguist George Kingsley Zipf [53].

  2. Group of awareness and emotion variables have been regressed on cases (i) and deaths (ii) separately. Sentiment analysis and total population of county variables have been used as control in the case of regressions that look at awareness. For the regression on emotions, neutrality has been selected as the reference variable. Similarly, for the calculation of sentiment scores, neutrality has again been assigned as the reference category.

  3. In these models, population has not been used as a control variable, since it is confounded with the lagged variable for number of cases and deaths.

  4. To measure the effects of policies, using a different dataset, such as data on social distancing (as an intermediate variable) could be useful.

References

  1. Arellano, M., & Bond, S. (1991). Some tests of specification for panel data: Monte Carlo evidence and an application to employment equations. The Review of Economic Studies, 58(2), 277–297.

    Article  Google Scholar 

  2. Arellano, M., & Bover, O. (1995). Another look at the instrumental variable estimation of error-components models. Journal of Econometrics, 68(1), 29–51.

    Article  Google Scholar 

  3. Baker, W. D., & Oneal, J. R. (2001). Patriotism or opinion leadership? The nature and origins of the “rally’round the flag” effect. Journal of Conflict Resolution, 45(5), 661–687.

    Article  Google Scholar 

  4. Barash, V., & Kelly, J. (2012). Salience vs. commitment: Dynamics of political hashtags in Russian Twitter. (pp. 2012–2019). Berkman Center Research Publication.

    Google Scholar 

  5. Bastos, M. T., Raimundo, R. L. G., & Travitzki, R. (2013). Gatekeeping Twitter: Message diffusion in political hashtags. Media, Culture & Society, 35(2), 260–270.

    Article  Google Scholar 

  6. Beigi, G., Hu, X., Maciejewski, R., & Liu, H. (2016). An overview of sentiment analysis in social media and its applications in disaster relief. In: Sentiment analysis and ontology engineering (pp. 313–340). Springer.

  7. Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with python: analyzing text with the natural language toolkit. O’Reilly Media Inc.

  8. Blei, D.M., Ng, A.Y., & Jordan, M.I. (2003) Latent dirichlet allocation. Journal of Machine Learning Research 3, 993–1022.

    Google Scholar 

  9. Blondel, V. D., Guillaume, J.-L., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(10), P10008.

    Article  Google Scholar 

  10. Blundell, R., & Bond, S. (1998). Initial conditions and moment restrictions in dynamic panel data models. Journal of Econometrics, 87(1), 115–143.

    Article  Google Scholar 

  11. Boulos, M. N. K., & Geraghty, E. M. (2020). Geographical tracking and mapping of coronavirus disease COVID-19/severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) epidemic and associated events around the world: how 21st century GIS technologies are supporting the global fight against outbreaks and epidemics. Int J Health Geogr., 19(1), 8. https://doi.org/10.1186/s12942-020-00202-8.

    Article  Google Scholar 

  12. Bourdieu, P. (1977). The economics of linguistic exchanges. Information (International Social Science Council), 16(6), 645–668. https://doi.org/10.1177/053901847701600601.

    Article  Google Scholar 

  13. Boyd, D., Golder, S., & Lotan, G. (2010). Tweet, tweet, retweet: Conversational aspects of retweeting on twitter. In: 2010 43rd Hawaii International Conference on System Sciences, pp. 1–10.

  14. Budanitsky, A., & Hirst, G. (2006). Evaluating wordnet-based measures of lexical semantic relatedness. Computational Linguistics, 32(1), 13–47.

    Article  Google Scholar 

  15. Chretien, K. C., & Kind, T. (2013). Social media and clinical care: ethical, professional, and social implications. Circulation, 127(13), 1413–1421.

    Article  Google Scholar 

  16. Cinelli, M., Quattrociocchi, W., Galeazzi, A., Valensise, C. M., Brugnoli, E., Schmidt, A. L., Zola, P., Zollo, F., & Scala, A. (2020). The COVID-19 Social Media Infodemic. ArXiv:2003.05004 [Nlin, Physics:Physics]. http://arxiv.org/abs/2003.05004

  17. Davis, C. A., Varol, O., Ferrara, E., Flammini, A., & Menczer, F. (2016). Botornot: A system to evaluate social bots. In: Proceedings of the 25th International Conference Companion on World Wide Web, pp. 273–274.

  18. DeAndrea, D. C., & Vendemia, M. A. (2016). How affiliation disclosure and control over user-generated comments affects consumer health knowledge and behavior: A randomized controlled experiment of pharmaceutical direct-to-consumer advertising on social media. Journal of Medical Internet Research, 18(7), e189.

    Article  Google Scholar 

  19. DeArmas, N. (2018). Using hashtags to disambiguate aboutness in social media discourse: A case study of #OrlandoStrong. Electronic Theses and Dissertations, 2004–2019. https://stars.library.ucf.edu/etd/6182. Accessed 28 June 2020.

  20. Denef, S., Bayerl, P. S., & Kaptein, N. A. (2013). Social media and the police: Tweeting practices of British police forces during the August 2011 riots. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 3471–3480.

  21. Dizon, D. S., Graham, D., Thompson, M. A., Johnson, L. J., Johnston, C., Fisch, M. J., & Miller, R. (2012). Practical guidance: The use of social media in oncology practice. Journal of Oncology Practice, 8(5), e114–e124.

    Article  Google Scholar 

  22. Ehnis, C., Bunker, D. (2012). Social media in disaster response:queensland police service—public engagement during the 2011 floods. In ACIS 2012 Proceedings, January 1, 2012. https://aisel.aisnet.org/acis2012/107

  23. Ekman, P. (1992) Are there basic emotions? Psychological Review 99(3), 550–553. https://doi.org/10.1037/0033-295X.99.3.550.

    Article  Google Scholar 

  24. Ferragina, P., Piccinno, F., & Santoro, R. (2015). On analyzing hashtags in Twitter. In: Ninth International AAAI Conference on web and social media.

  25. Fung, I., Tse, Z., Cheung, C.-N., Miu, A., & Fu, K. (2014). Ebola and the social media. The Lancet, 384, 2207. https://doi.org/10.1016/S0140-6736(14)62418-1.

    Article  Google Scholar 

  26. Fung, I.C.-H., Tse, Z. T. H., & Fu, K.-W. (2015). The Use of Social Media in Public Health Surveillance. Western Pacific Surveillance and Response Journal WPSAR, 6(2), 3.

    Article  Google Scholar 

  27. Gallotti, R., Valle, F., Castaldo, N., Sacco, P., & Domenico, M. D. (2020). Assessing the risks of “infodemics” in response to COVID-19 epidemics. MedRxiv. https://doi.org/10.1101/2020.04.08.20057968.

    Article  Google Scholar 

  28. Garton, L., Haythornthwaite, C., & Wellman, B. (1997). Studying online social networks. Journal of Computer-Mediated Communication. https://doi.org/10.1111/j.1083-6101.1997.tb00062.x.

    Article  Google Scholar 

  29. Goolsby, R. (2010). Social media as crisis platform: The future of community maps/crisis maps. ACM Transactions on Intelligent Systems and Technology (TIST), 1(1), 1–11.

    Article  Google Scholar 

  30. Gu, S., Wang, F., Yuan, T., Guo, B., & Huang, J. H. (2015). Differentiation of primary emotions through neuromodulators: Review of literature. International Journal of Neurology Research, 1(2), 43–50.

    Article  Google Scholar 

  31. Harber, K. D., & Cohen, D. J. (2005). The emotional broadcaster theory of social sharing. Journal of Language and Social Psychology, 24(4), 382–400.

    Article  Google Scholar 

  32. Haryadi, D., Kusuma, G.P. (2019). Emotion detection in text using nested long short-term memory. 11480 (IJACSA) International Journal of Advanced Computer Science and Applications 10(6).

  33. Huang, J., Thornton, K. M., & Efthimiadis, E. N. (2010). Conversational tagging in Twitter. In: Proceedings of the 21st ACM Conference on Hypertext and Hypermedia, 173–178.

  34. Hughes, Amanda L., St. Denis, L. A., Palen, L., & Anderson, K. M. (2014). Online public communications by police & fire services during the 2012 Hurricane Sandy. In: Proceedings of the SIGCHI Conference on human factors in computing systems, pp. 1505–1514.

  35. Hughes, A. L., & Palen, L. (2009). Twitter adoption and use in mass convergence and emergency events. International Journal of Emergency Management, 6(3–4), 248–260.

    Article  Google Scholar 

  36. Huo, Ji., Desai, R., Hong, Y.-R., Turner, K., Mainous, A. G., & Bian, J. (2019). Use of social media in health communication: Findings from the Health Information National Trends Survey 2013, 2014, and 2017. Cancer Control: Journal of the Moffitt Cancer Center. https://doi.org/10.1177/1073274819841442.

    Article  Google Scholar 

  37. Jack, R. E., Garrod, O. G., & Schyns, P. G. (2014). Dynamic facial expressions of emotion transmit an evolving hierarchy of signals over time. Current Biology, 24(2), 187–192.

    Article  Google Scholar 

  38. Kass-Hout, T. A., & Alhinnawi, H. (2013). Social media in public health. British Medical Bulletin 108(1):5–24.

    Article  Google Scholar 

  39. Khatua, A., Khatua, A., & Cambria, E. (2019). A tale of two epidemics: Contextual Word2Vec for classifying twitter streams during outbreaks. Information Processing & Management, 56(1), 247–257.

    Article  Google Scholar 

  40. Kite, J., Foley, B.C., Grunseit, A.C., & Freeman, B. (2016). Please like Me: Facebook and Public Health Communication. PloS One 11(9), e0162765.

    Article  Google Scholar 

  41. Kivran-Swaine, F., & Naaman, M. (2011). Network properties and social sharing of emotions in social awareness streams. In: Proceedings of the ACM 2011 Conference on computer supported cooperative work, 379–82, 2011.

  42. Lin, X., Lachlan, K. A., & Spence, P. R. (2016). Exploring extreme events on social media: A comparison of user reposting/retweeting behaviors on Twitter and Weibo. Computers in Human Behavior, 65, 576–581.

    Article  Google Scholar 

  43. Marlow, C., Naaman, M., Boyd, D., & Davis, M. (2006). Position paper, tagging, taxonomy, flickr, article, toread. In: Collaborative Web Tagging Workshop at WWW’06, pp. 31–40.

  44. Meng, X., Wei, F., Liu, X., Zhou, M., Li, S., & Wang, H. (2012). Entity-centric topic-oriented opinion summarization in Twitter. In: Proceedings of the 18th ACM SIGKDD International Conference on knowledge discovery and data mining, pp. 379–387.

  45. Missier, P., McClean, C., Carlton, J., Cedrim, D., Silva, L., Garcia, A., Plastino, A., & Romanovsky, A. (2017). Recruiting from the network: Discovering Twitter users who can help combat zika epidemics. In: International Conference on web engineering, pp. 437–445.

  46. Moorhead, S. A., Hazlett, D. E., Harrison, L., Carroll, J. K., Irwin, A., & Hoving, C. (2013). A new dimension of health care: Systematic review of the uses, benefits, and limitations of social media for health communication. Journal of Medical Internet Research, 15(4), e85.

    Article  Google Scholar 

  47. Mukkamala, A., & Beck, R. (2016). Enhancing disaster management through social media analytics to develop situation awareness what can be learned from twitter messages about hurricane sandy? In: PACIS, p. 165.

  48. Murakami, D., Peters, G. W., Yamagata, Y., & Matsui, T. (2016). Participatory sensing data tweets for micro-urban real-time resiliency monitoring and risk management. IEEE Access, 4, 347–372.

    Article  Google Scholar 

  49. Naaman, M., Boase, J., & Lai, C.-H. “Is it really about me? Message content in social awareness streams. In: Proceedings of the 2010 ACM Conference on Computer Supported Cooperative Work, 189–92, 2010.

  50. Nielsen, F., Arup, A., Balslev, D., & Hansen, L. K. (2005). Mining the posterior cingulate: Segregation between memory and pain components. NeuroImage, 27(3), 520–532.

    Article  Google Scholar 

  51. Ozdikis, O., Senkul, P., & Oguztuzun, H. (2012). Semantic expansion of tweet contents for enhanced event detection in Twitter. In: 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 20–24.

  52. Page, R. (2012). The linguistics of self-branding and micro-celebrity in Twitter: The role of hashtags. Discourse & Communication. https://doi.org/10.1177/1750481312437441.

    Article  Google Scholar 

  53. Piantadosi, S. T. (2014). Zipf’s word frequency law in natural language: A critical review and future directions. Psychonomic Bulletin & Review, 21(5), 1112–1130. https://doi.org/10.3758/s13423-014-0585-6.

    Article  Google Scholar 

  54. Pöschko, J. (2011). Exploring Twitter hashtags. ArXiv Preprint ArXiv:1111.6553.

  55. Recuero, R., Zago, G., Bastos, M. T., & Araújo, R. (2015). Hashtags functions in the protests across Brazil. SAGE Open, 5(2), 2158244015586000.

    Article  Google Scholar 

  56. Reuter, C., Heger, O., & Pipek, V. (2013). Combining real and virtual volunteers through social media. . Iscram.

    Google Scholar 

  57. Reuter, C., Ludwig, T., Kaufhold, M.-A., & Pipek, V. (2015). XHELP: Design of a cross-platform social-media application to support volunteer moderators in disasters. In: Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, 4093–4102.

  58. Rieffe, C., Oosterveld, P., Miers, A. C., Terwogt, M. M., & Ly, V. (2008). Emotion awareness and internalising symptoms in children and adolescents: The EMOTION AWARENESS QUESTIONNAIRE REVISED. Personality and Individual Differences, 45(8), 756–761.

    Article  Google Scholar 

  59. Rimé, B., Finkenauer, C., Luminet, O., Zech, E., & Philippot, P. (1998). Social sharing of emotion: New evidence and new questions. European Review of Social Psychology, 9(1), 145–189.

    Article  Google Scholar 

  60. Scherer, K.R. (2000). Psychological models of emotion. The Neuropsychology of Emotion 137(3), 137–162.

    Google Scholar 

  61. Shapp, A. (2014). Variation in the use of Twitter hashtags (Qualifying Paper in Sociolinguistics). . New York University.

    Google Scholar 

  62. Stieglitz, S., Bunker, D., Mirbabaie, M., & Ehnis, C. (2018). Sense-making in social media during extreme events. Journal of Contingencies and Crisis Management, 26(1), 4–15.

    Article  Google Scholar 

  63. Tang, L., Bie, B., Park, S.-E., Zhi, D. (2018) Social media and outbreaks of emerging infectious diseases: a systematic review of literature. American Journal of Infection Control 46(9), 962–972.

    Article  Google Scholar 

  64. Varol, O., Ferrara, E., Ogan, C. L., Menczer, F., & Flammini, A. (2014). Evolution of online user behavior during a social upheaval. In: Proceedings of the 2014 ACM Conference on Web Science, 81–90.

  65. Yang, L., Sun, T., Zhang, M., & Mei, Q. (2012). We know what@ you# tag: Does the dual role affect hashtag adoption? In: Proceedings of the 21st International Conference on World Wide Web, 261–270.

  66. Zhou, L., Zhang, D., Yang, C. C., & Wang, Yu. (2018). Harnessing social media for health information management. Electronic Commerce Research and Applications, 27, 139–151.

    Article  Google Scholar 

Download references

Funding

No funding has been received for the completion of this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cantay Caliskan.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Ethical approval

The author declares that this work has been composed by the author only and that the work has not been submitted for any consideration in any other research outlet.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Core hashtags

#2019_ncov, #2019_nCoV, #2019ncov, #2019nCoVmissouri, #ARDS, #Asthma, #Congestionnasal, #convid19, #COPD, #corinavirus, #cornavirus, #corona, #coronachina, #coronaoutbreak, #coronavairus, #coronavid19, #coronavirius, #coronavirus, #coronavir¸s, #coronaviruses, #coronavirusitalianews, #coronavirusitaly, #coronavirusoutbreak, #coronaviruspandemic, #coronaviruss, #coronavir¸s¸, #coronavirusupdates, #cotonavirus, #cov19, #covd19, #COVID, #cov?d, #covi?d_19, #COVID19, #Covid19, #covid19, #cov?d19, #covi?d19, #Covid-19], #covid19italia, #covid19news, #covid19outbreak, #covid19pr, #covid2019, #Covidiots, #covidnews, #covid?19, #cvid19, #DeviatedSeptum, #disease, #dontpanic, #epidemic, #FlattenTheCurve, #Flu, #Grippe, #H1N1, #HcoV19, #illness, #Influenza, #IStayHomeFor, #Legionnaires, #LockdownNow, #ncov, #ncov19, #nCoV19, #ncov2019, #ncov2019, #nCoV2019, #Pandemic, #pandemic, #plagueinc, #pleuralEffusion, #Pneumonia, #precaution, #PreventingTheFlu, #prevention, #quarantine, #SafeHands, #SARSCoV2, #sarscov2, #SocialDistancing, #StayAtHomeChallenge, #StayHome, #staysafe, #TogetherAtHome, #ViewFromMyWindow, #virus, #viruses, #worldhealth, #worldhealthorganization, #wuhan, #WuhanPneumonia, #wuhanvirus, #WuhanVirus

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Caliskan, C. How does “A Bit of Everything American” state feel about COVID-19? A quantitative Twitter analysis of the pandemic in Ohio. J Comput Soc Sc 5, 19–45 (2022). https://doi.org/10.1007/s42001-021-00111-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s42001-021-00111-1

Keywords

  • COVID-19
  • Twitter
  • Awareness
  • Emotion classification