1 Introduction

Emojis, graphic symbols carrying specific meanings, which have been quickly adopted into online conversations and widely used to express emotions, have become a universal language used across apps, platforms, and cultures [1,2,3]. Compared with words, emojis can express individual emotions more accurately and directly [4]. In analysing short informal texts, such as tweets, blogs, or comments, emojis provide crucial information on user sentiment [5,6,7,8]. The emergence of emojis provides new opportunities to analyse sentiment expressions in a textual context. Moreover, it has been proven that emerging events, such as natural disasters, may lead to unrepresentative user moods and behaviours and affect the usage of emojis [9]. Therefore, the changes in emoji use patterns can reflect the impact of public health emergencies on personal moods. Despite the popularity of emojis, however, few studies have analysed the usage of emoji characters in social networks at scale and in-depth, especially regarding the emoji use pattern after a public emergency.

The coronavirus disease 2019 (COVID-19) pandemic [10,11,12] had caused over 242 million cumulative confirmed cases and more than 4.93 million deaths worldwide as of October 22, 2021 [13]. Some studies have indicated that the pandemic not only threatens physical health but also affects individuals’ mental state and emotions [14]. According to behavioural immune system theory [15], people are more likely to develop negative emotions (such as aversion and anxiety) [16, 17] for self-protection when faced with a public crisis. In line with this theory, stress theory [18] and perceived risk theory [19] also suggest that public health emergencies trigger more negative emotions. These negative emotions keep people away from potential pathogens when referring to the disease. However, long-term negative emotions may reduce the immune function of people and destroy the balance of their normal physiological mechanisms [14, 20]. Therefore, it is essential to promptly understand the potential psychological changes caused by COVID-19.

Here, we present an effort to collect and analyse all emojis embedded by online users in posts during the COVID-19 outbreak on Sina Weibo, the most popular public social media site in China. To the best of our knowledge, this is the first study to examine a large and fine-grained dataset of over 38 million social media posts to understand the evolution of emoji usage before and after COVID-19 and accordingly uncover the impact of this pandemic on individual behaviour and moods. There are three main innovations in this paper:

  1. (1)

    We compared emoji usage with the evolution of individual sentiment during normal times and after the COVID-19 pandemic outbreak and revealed the emoji use patterns of Chinese social media users after public health emergencies.

  2. (2)

    We built an emoji co-occurrence network and a maximum frequency co-occurrence network between emojis on the Sina Weibo platform, which directly disclosed the co-occurring relationship of different emojis in posts.

  3. (3)

    We comprehensively analysed the relationship between the development trend of the pandemic, different post topics, user types (user level, user gender) and individual emoji use habits.

2 Related work

The unprecedented magnitude and transmission speed of COVID-19 has intensified massive social media activities as people are isolated at home to break infection chains [21,22,23,24,25,26]. The large amount of social data generated by online users can help us understand the topics of concern and emotional fluctuations of the epidemic-related population in real time, which is necessary for timely detection of people’s needs and emotional counselling. Recent research has shown that public data on social networks can be employed to analyse the sentiments of internet users.

Nemes and Kiss [27] analysed users’ sentiments on the Twitter social media platform by using a recurrent neural network model to determine what emotional manifestations arose surrounding a given topic. Manguri et al. [28] collected a seven-day dataset from Twitter based on two specified hashtag keywords (“COVID-19, coronavirus”) and analysed the sentiment polarity and subjectivity of these COVID-19-related texts. They found that neutral emotions regarding both coronavirus and COVID-19 keywords were significantly high (50%). Currently, studies have attempted to use public data on Chinese social platforms to explore the emotional changes of Chinese citizens during the COVID-19 period. Wang et al. [29] randomly selected 999,978 COVID-19-related Weibo posts from 1 January 2020 to 18 February 2020 and adopted the bidirectional encoder representations from transformers (BERT) model to conduct sentiment classification. Tan et al. [30] conducted a sentiment analysis based on social media posts to examine the long-term negative effects of the COVID-19 pandemic on public sentiments.

However, almost all studies extracted the text content posted by users online as clues for sentiment analysis and focused on classifying sentences into three sentiment categories (positive, neutral, and negative). In addition, due to the expression diversity of Chinese textual content, the variability of Chinese syntax and semantics, and the rapid update of internet slang, it is still challenging to identify users’ sentiments from Chinese plain texts with high accuracy [31,32,33,34]. In recent years, due to their simplicity, convenience and conduciveness to emotional expression, emojis have been widely used in internet communication. Like nonverbal clues in face-to-face communication, emojis are able to help clarify intentions in ambiguous contexts [35], express emotions [36, 37] and improve the efficiency of communication [38]. Studies in the field of computer science have focused on using emojis as an effective means for sentiment analysis on UGC (user-generated content) data [39, 40].

Li et al. [41] constructed a sentiment model with automatically annotated training sets using an emoji approach, which performs well in classifying tweets into five different sentiment classes. Kaye [42] proved that as a contextual cue, emojis can help users establish an emotional tone, reduce the ambiguity of semantic expressions and improve appropriateness relative to context. Settanni and Marengo [43] investigated the relationship between user-generated textual content shared on Facebook and emotional well-being, and they examined the correlation between emoji usage behaviour and users’ stress level. In addition, Barbieri et al. [44] compared the meaning and usage of emojis across two Spanish cities and suggested that cultural differences between different regions can reflect how they communicate in social networks. Cheng et al. [45] found that the different cultural backgrounds of message senders could influence their emoji usage and emotional expressivity.

A number of studies have confirmed the effective performance of emojis in sentiment analysis. However, currently, there is little research on emoji usage during the COVID-19 period [46, 47], as well as on how public emergencies affect people’s emoji usage patterns (emotional expressions). In particular, to the best of our knowledge, this is the first study to examine the emoji usage of Chinese citizens before and after the COVID-19 outbreak.

3 Data and methods

3.1 Data description

Weibo is a popular Twitter-like social media platform in China, making it a rich publicly available data source for opinion mining and sentiment analysis. We collected all data from Weibo that were posted publicly by users located in Wuhan (the capital of Hubei Province in China), including microblog text, posting time, author ID, and gender, from December 1, 2019, to March 20, 2020. By extracting emoji tags from the post texts, we compared the changes in emoji usage patterns and observed the sentiment evolution of Weibo users before and after the COVID-19 outbreak to explore the impact of COVID-19 on users’ online behaviour. Overall, 38,183,194 microblog posts posted by 2,239,472 unique users were collected.

We found that emotion tokens, i.e., emoji characters, were commonly used in Weibo posts. There were 15,609,843 posts containing emoji symbols, accounting for 40.88% of all posts. In addition, 1,279,828 users used emojis at least once, accounting for 57.15% of all unique users. Surprisingly, while no substantial change was observed in the daily number of posts over time (Fig. 1A), both the proportion of posts with emojis and the ratio of users using emojis declined substantially after January 18, 2020. Around this time, a high-level epidemiological team led by the well-known Chinese respiratory expert Zhong Nanshan arrived in Wuhan from Beijing and later confirmed human-to-human transmission of severe acute respiratory syndrome coronavirus 2 (the causative virus for COVID-19) with infections among medical staff (Fig. 1B). The public began to recognize the high risk of COVID-19 and pay more attention to the epidemic. At this time, the average number of emojis used by each user dropped dramatically; in contrast, the number of posts published by each user increased slightly (Fig. 1C). Similarly, we calculated the average number of emojis employed in each post as well as the number of emojis used by each user in each post and found that they both decreased remarkably after January 18, 2020. In addition, the topic keywords discussed on Weibo in the early period of our dataset were analysed, and it was found that “pneumonia” (“肺炎” in Chinese) began to appear among the popular topics on January 5, 2020 (see Fig. S2 in Supplementary Information), even before COVID-19 was discovered and announced.

Fig. 1
figure 1

(A) Number of active users, posts, and emojis; (B) The proportion of Weibo posts with embedded emojis and the proportion of users who used emojis; (C) Average number of posts and emojis per user

3.2 Analytical framework

To comprehensively measure the impact of COVID-19 on users’ online social behaviour and public sentiment, changes in the emoji usage of online users during the COVID-19 outbreak were compared from multiple perspectives. First, by determining the most frequently used emojis daily, we selected two sets of popular emojis on Weibo before and after the COVID-19 outbreak and compared the differences. By utilizing the official annotations of emoji characters, which provide their textual descriptions by “translating” emojis back into words, we manually classified all popular emoji characters into five groups and examined the daily use frequency of different emoji groups. In analysing the correlation of emoji usage with COVID-19 infections, the Pearson correlation coefficient (r) was calculated. In addition, we compared the temporal patterns of emoji occurrence over 111 days to explore the impact of COVID-19 on the diurnal moods of Wuhan residents. By considering emojis as nodes and the co-occurrence relationship between any two emojis as edges, we also analysed the emoji co-occurrence structure from the perspective of complex networks. Finally, we examined the difference in emoji usage between different topics, user levels, and gender groups to explore other factors that may affect emoji usage. In this study, all text segmentation work was carried out with Jieba [48, 49], a commonly used Chinese word segmentation package. Topics discussed by online users were examined by employing the Latent Dirichlet Allocation (LDA) model [50, 51], a three-level hierarchical Bayesian model, which aims to understand and extract hidden topics from large volumes of text documents [52]. The overall analytical framework is illustrated in Fig. 2.

Fig. 2
figure 2

The workflow of analysis in this study

4 Results

4.1 Basic statistical analysis

Popular emojis

Similar to word frequency, which usually conforms to Zipf’s law or Pareto’s law [53, 54], there is also a large heterogeneity in the popularity of different emojis; i.e., the top 100 popular emojis account for approximately 96% of all emojis used daily, and the top 10 popular emojis account for approximately 50.4% (Fig. S3). Among the 1,840 unique emojis, we determined the top 100 most frequently used emojis anytime during the collection period as popular emojis, which yielded 154 unique emojis overall. In determining the daily top 10 popular emojis on Weibo before and after the COVID-19 outbreak (Fig. 3), we found that after the pandemic, most of the newly added popular emojis were related to negative emotions, such as sadness, anger, and disappointment. A few positive emojis, such as Cheer for Wuhan ( ) and Peace Lantern ( ), were also posted frequently, expressing support for anti-epidemic work and encouragement and cheering for medical staff. In addition, due to the shortage of living materials in quarantined communities, the number of food-related emojis increased significantly, such as Chicken Leg.

Fig. 3
figure 3

Comparison of the daily top 10 popular emoji sets before and after the COVID-19 outbreak

Emoji classification

According to the emotion expressed by the emoji, we classified the 154 popular emojis into 5 categories, i.e., Happy, Encouraging, Questioned, Sad, and Angry (Table S1). By analysing the evolution in the usage of different emoji groups over time, we found that after January 18, 2020, the number of emojis in the category Happy decreased rapidly, while the number of emojis relevant to Sad increased. Furthermore, the daily proportion of Happy emojis declined significantly and remained low (approximately 0.32) for nearly 29 days after the COVID-19 outbreak, while the daily proportion of emojis in the Sad and Encouraging categories rose to 0.24 and 0.34, respectively (Fig. 4). On January 31, 2020, Li Wenliang [55], an ophthalmologist at Wuhan Central Hospital, posted his experience of being summoned by the police for "making false comments on the internet about unconfirmed SARS outbreaks" on Weibo. At this time, the use of anger-related emojis reached its peak. These results indicate that the COVID-19 epidemic significantly impacted individuals’ moods, and accordingly, people tended to express more sadness and encouragement after the COVID-19 outbreak.

Fig. 4
figure 4

The use probability of emojis in different emotion categories and the newly infected cases of COVID-19 in Wuhan

Correlation of emoji usage with COVID-19

We found that the average number of emojis used in each post during the period was negatively correlated with the proportion of pneumonia-related posts (r = -0.698, p < 0.001); i.e., the more pneumonia-related posts were published, the fewer emojis were utilized. This suggests that Weibo users were likely to post fewer emojis when mentioning COVID-19-related topics. Moreover, the rate of emojis relevant to Happy was negatively correlated with the ratio of pneumonia-related posts (r = -0.808, p < 0.001 \(r=-0.808, p<0.001\). In contrast, the proportion of Sad emojis and the rate of posts mentioning pneumonia showed a significant positive correlation (r = 0.813, p < 0.001 \(r=0.813, p<0.001\). Surprisingly, the proportion of Encouraging emojis was also positively correlated with the proportion of pneumonia-related posts (r = 0.604, p < 0.001 \(r=0.604, p<0.001\)), indicating that although the COVID-19 outbreak negatively affected individuals’ moods, the positive emotions regarding encouragement and cheering also increased, which to some extent demonstrated the confidence and support of Wuhan residents for anti-epidemic work.

4.2 Temporal pattern of emoji usage

Hourly pattern

In the temporal analysis of emoji usage, we first divided the study time into 4 periods in accordance with the development of COVID-19. The first period (P1) covers the time from December 01, 2019, to January 01, 2020; the second period (P2) is from January 01, 2020, to January 18, 2020; the third period (P3) is from January 18, 2020, to February 18, 2020; and the fourth period (P4) is from February 18, 2020, to March 20, 2020.

We found that the hourly changes in the number of emojis used during the whole period were consistent with the number of posts; i.e., the number of emojis and posts both peaked at 10:00 pm and were minimal at approximately 5:00 am. However, the hourly emoji use rate (average number of emojis used in each post) after the COVID-19 outbreak (P3 and P4) was clearly different from the pattern before the outbreak (P1 and P2), as shown in Fig. 5A. The peak time of the emoji use rate before the pandemic was at midnight or in the early morning, when the rate during P3 and P4 was at the lowest. In addition, users became more negative at midnight (approximately 3:00 am); i.e., the proportion of emojis related to happiness and encouragement decreased significantly, while emojis in the Sad and Angry categories increased dramatically (Fig. 5B). This proves that COVID-19 negatively impacted individuals’ moods. Furthermore, online users were more psychologically vulnerable at midnight, often communicating the pessimistic part of their emotions.

Fig. 5
figure 5

(A) Daily pattern of the emoji use frequency in 4 periods (standardized with the number of posts per hour); (B) Daily pattern of the proportion of emojis in different emotion categories (Standardized with the total number of emojis per hour) before and after the COVID-19 outbreak; (C) The hourly number of emojis used in each post on weekdays and weekends; (D) The proportion of different types of emojis used at each hour on weekdays and weekends

Weekly pattern

On weekdays, the number of emojis used in each post was the highest (emoji use rate) during the evening (approximately 10:00 pm) and early morning (4:00–6:00 am). However, the peak period of the emoji use rate on weekends was not obvious, and the emoji use pattern daily showed a more even distribution (Fig. 5C). In addition, the diversity of emojis used on weekends was much higher than that on weekdays, even if there were not more emojis used. On weekends, the proportion of Encouraging and Happy emojis used was significantly higher in the morning (approximately 9:00 am) than at other times (Fig. 5D), indicating that people tended to be more positive and optimistic in the morning on weekends. However, whether on weekdays or weekends, the use of emojis related to negative emotions (Sad and Angry) reached its peak at midnight. This also proves that people were more likely to reveal their psychologically vulnerable and negative side late at night. We found that the difference in emoji usage between weekdays and weekends varied significantly with the development of COVID-19. During P1 and P2, there was a clear peak in the emoji use rate in the early morning (approximately 4:00 am) on working days, while after the COVID-19 outbreak, the daily trends on weekdays and weekends both remained relatively stable throughout the day (Fig. S6). This could be attributed to the stay-home quarantine for controlling the spread of COVID-19; afterwards, residents had a more even time distribution.

4.3 Emoji co-occurrence network

Network construction

To understand the collocation pattern among emojis, we constructed a co-occurrence network with nodes standing for the unique emojis and the weight of edges being the number of times the two emojis have been used together. The network contains 1,711 nodes and 31,878 edges, with an average degree of 37.3; i.e., each emoji is used, on average, with approximately 37 other emojis together in posts. is the most connected node, with a degree of 591, while there are more than 45% nodes with a degree less than 10 and approximately 87% nodes with a degree less than 100. We found that the average weighted degree of the network is 298.4, indicating that, on average, each emoji appears with other emojis approximately 298 times. The node with the highest weighted degree (12,970) is , a consequence of the great number of Christmas greetings during the data collection period, and the emoji pair (edge) [ ] has the highest co-occurrence frequency in posts (weight of edge, 66,653); however, the weight of approximately 91% of the edges in the network is less than 100, and the weight of more than 74% edges is less than 10. In addition, we found that there was a clear community structure in the co-occurrence network.

Evolution of the emoji co-occurrence network

We extracted the emojis in all posts and constructed the emoji co-occurrence networks before and after the COVID-19 outbreak (Net1 and Net2, for short), as shown in Fig. 6A and B. We found that the number of nodes and edges in the network greatly decreased after the COVID-19 outbreak. This is consistent with the above findings, indicating that due to the epidemic, users reduced the use frequency of emojis. However, the average degree of this network increased from 34 to 39.8, which means that although the number of emojis used decreased, the diversity of emoji usage increased. After the COVID-19 outbreak, the average clustering coefficient of the emoji co-occurrence network also increased from 0.23 to 0.32, indicating that more triangular structures appeared in the emoji connections. The top 3 nodes with the largest weighted degree in Net2 are (Table 1), related to emotions about grievance, sadness and doubt, which are significantly different from the most connected nodes in Net1. The emoji pairs with the highest co-occurrence frequency after the pandemic also changed considerably, and the use frequency of emojis relevant to sadness (e.g., ) increased. By comparing the community structure of Net1 and Net2, we also described the flow of members (nodes) in the TOP-10 communities before and after the pandemic (Fig. 7).

Fig. 6
figure 6

(A) Net1 (node’s degree > 1, edge’s weight > 100); (B) Net2 (node’s degree > 1, edge’s weight > 100). Nodes represent emojis, the size of the node represents the node’s weighted degree, and the colour of the node distinguishes different communities. The thickness of the edge represents the edge’s weight, i.e., the co-occurrence frequency between two nodes

Table 1 Comparison between Net1 and Net2
Fig. 7
figure 7

Member flow in the TOP-10 communities of Net1 and Net2

Maximum co-occurrence frequency network

To discover the typical emoji collocation on Weibo, we built the maximum co-occurrence frequency network by retaining the node pair (edge) with the largest co-occurrence probability. The weight of the edge represents the co-occurrence probability between nodes (from 0 to 1). There were mainly 4 hub nodes ( ) in the network (Fig. 8A) during P1 and P2. After the COVID-19 outbreak, Flower () replaced Longing ( ) as a new hub node, and more discrete emoji pairs, as well as small subclusters, appeared, e.g., the emoji pairs connected by Angry () and a series of emojis about gestures (Fig. 8B). Emojis and emoji pairs related to sadness, gratitude, and encouragement were used more frequently. In addition, the cluster connected by Love ( ) decreased, but the size of the community linked by Facepalm ( ) grew significantly, and there were more emojis relevant to negative emotions, such as Dizzy ( ), Scared ( ), and Wearing Mask ( ). These results indicate that users’ habits regarding emoji usage indeed changed after the COVID-19 outbreak. Due to the emergence of newly popular emojis and the evolution of the moods users wanted to express, the fixed emoji co-occurrence patterns were significantly altered.

Fig. 8
figure 8

The maximum co-occurrence frequency network of the top 100 most frequent emojis (A) before and (B) after the COVID-19 outbreak. The size of the node represents the node’s degree, and the thickness of the edge represents the probability of co-occurrence

4.4 Topic, user level and gender

Correlation between emoji usage and post topics

To examine whether the topic affected emoji usage in the post, we extracted all posts containing hashtags and analysed the correlation between different topics and emoji use patterns. All hashtags (##) on Weibo from December 1, 2019, to March 20, 2020, with over 500 users and 500 microblog posts, were manually divided into 4 groups according to the topic, i.e., stars, daily life, games or e-sports, and social events (including COVID-19). These 4 topics covered a total of 3,830,100 posts. It was observed that before the COVID-19 outbreak, most topics discussed online were involved with stars and daily life. After the COVID-19 outbreak, discussions on these topics diminished dramatically, while the number of posts related to public social events rapidly increased (Fig. S10).

We then compared the evolution of the proportion of posts containing emojis on these four topics (Fig. 9A). Users posted emojis most frequently when discussing topics related to stars and entertainment, while the emoji use in posts related to social events was the lowest, suggesting that users tend to be more cautious and objective when discussing public social events. After the COVID-19 outbreak, the proportion of posts containing emojis in topics on social events moderately increased, while the proportion regarding stars, daily life, and games dropped greatly. In addition, we found that most emojis used in these topics were related to Happy and Encouraging, while more Sad emojis were posted in posts related to social events. Using the total number of emojis used daily to standardize the number of emojis in different categories, we found that after the COVID-19 outbreak, users’ emotions showed greater fluctuations than before when discussing public social events, and they expressed more incentives, sadness, and anger (Fig. 9B).

Fig. 9
figure 9

(A) The proportion of posts containing emojis in different topics; (B) The proportion of different emojis used in posts related to social events; (C) The percentage of posts containing emojis posted by users at different levels; (D) The percentage of posts containing emojis posted by users of different genders

Differences between individual users and official users

There are a total of five user levels on Weibo, i.e., Ordinary User, Popular User, Orange V, Gold V, and Blue V. The first four levels all represent user accounts, while Orange V as well as Gold V are personal authentication accounts, standing for public figures with a certain social influence (e.g., actors). Blue V stands for officially certified enterprise or government accounts. We found that the proportion of posts containing emojis published by official users dropped dramatically after the COVID-19 outbreak, while the proportion of posts published by personal users changed much more moderately (Fig. 9C). This indicates that due to the public health emergency, official users decreased their emoji use frequency, and the texts posted by the government and enterprises tended to be more formal and serious than usual.

Emoji usage by user gender

Comparing the evolution of emoji usage between male and female users, we found that the average number of emojis posted by male users decreased significantly after the COVID-19 outbreak, while that posted by female users slightly increased (Fig. S12). Comparing the proportion of posts containing emojis between these two groups, it is also obvious that after the epidemic, the emoji use frequency of male users when posting online dramatically decreased (Fig. 9D). Male users were more inclined to utilize fewer emotional symbols and more formal language, which to some extent suggests that when facing public risks, men tend to conceal or control their emotional display, while women are more emotionally expressive [56,57,58].

5 Conclusion & discussion

As the COVID-19 pandemic sweeps across the world, it is causing widespread concern, fear, and stress. This study aims to explore the impacts of this public health emergency on individuals’ social behaviour and sentiment by comparing the emoji usage of online users on Weibo before and after the COVID-19 outbreak.

We found that the number of emojis used decreased dramatically after the COVID-19 outbreak, and more emojis related to sadness and encouragement were posted. However, the use probability of Happy emojis significantly decreased. In addition, the proportion of Happy emojis showed a negative correlation with the rate of COVID-19-related posts (r = -0.808), while the proportion of Sad emojis was positively correlated with the proportion of COVID-19-related posts (r = 0.813). Surprisingly, the rate of Encouraging emojis was also positively correlated with the rate of posts related to COVID (r = 0.604). Although the COVID-19 outbreak negatively affected individual sentiments, some positive emotions also increased, implying the confidence and support of Wuhan residents for the anti-epidemic work. On the other hand, we found that at midnight, users tend to be more psychologically vulnerable and more willing to communicate the negative side of their moods. In addition, after the pandemic, the government and enterprises were inclined to post fewer emojis, preferring regular textual language. Similarly, male users used fewer emotional symbols, indicating that when facing public risks, compared with women, men tend to conceal or control their emotional display.

This study reveals the pattern of emoji use on social networks after a public emergency and contributes to the knowledge gap of online behavioural changes and the sentiment evolution of social platform users due to the COVID-19 outbreak. These results can provide effective references for policy-makers to plan reasonable interventions and psychological counselling for populations whose sentiments are strongly affected by COVID-19. Although there may be slight deviations in the analysis due to the inability to obtain deleted posts, this has no significant impact on the overall conclusions. As different use habits of emojis may lead to different sentiment semantics [59, 60], in future work, we tend to further explore the emotional meanings conveyed by emojis by introducing detailed emoji use habits of online users, and analyse the usage patterns of more massive emojis across platforms and contexts.