From January 25 to April 30, 2020, we retrieved 316,988,440 COVID-19-related tweets, produced by 33,488,183 unique Twitter accounts. Preprocessing to account for original tweets and English language resulted in a subset of 20,614,490 original tweets written in English and produced by 4,834,467 unique accounts. Figure 1 shows the distribution of English tweets per day, while the profile of the retrieved dataset is summarized in Table 1.
For the purposes of this paper, we considered only original tweets written in English language (20,614,490). Further preprocessing excluded 383,657 tweets with less than two words (1.86% of all retrieved original English tweets). The final corpus consisted of 20,230,833 tweets, corresponding to a total of 197,728,410 words and a vocabulary of 2,139,369 unique words.
LDA parametrization experiments resulted in a value of 100 topics as a suitable initialization parameter. Screening of the topics analysis results identified 91 meaningful topics which were organized into nine categories as follows:
Life during the pandemic: 18 topics corresponding to how twitter users went through the pandemic. Examples include expression of sentiments (e.g., anger and fear, and fear of dying). Also includes ’USA protests’, ’USA primary elections (Wisconsin)’, and others related to art in quarantine (e.g., ‘movies and video games in quarantine’ and ‘musical bands and groups’).
Pandemic management: 16 topics related to pandemic issues and how to manage them (e.g., ‘relief bills in USA’, ‘US White House task force’, ‘donations and relief funds’, etc.).
Medical: 12 topics discussing concepts related to medical issues (e.g., ‘medical equipment and supplies’, ‘world health emergency declaration’, etc.) as well as for the ’vaccine development’ and the ’virus origin’.
Outbreak: 13 topics addressing different cases of outbreak (e.g., cases in CPAC 2020 conference, China and Wuhan outbreak, Diamond Princess cruise ship outbreak, Arabic countries outbreak etc.).
Lockdown: 9 topics pertaining to specific lockdowns worldwide (e.g., ‘Nigeria lockdown’, ‘India lockdown’, ‘European countries and Japan lockdown’, etc.), but also, in events that were canceled or postponed (e.g., football, basketball, etc.).
Economy: 10 topics that discuss the impact of COVID-19 on the economy. Examples include topics such as ’lockdown and economy restart’, ’cancellation fees and refunds’, and ’bitcoin and cryptocurrencies’. This category also includes topics related to the impact on economy (e.g., ‘impact on supply chain due to China lockdown’ and ‘impact on business and companies’).
Cases and deaths: 5 topics discussing about the number of cases and deaths caused by COVID-19, such as ’live data maps’,’ confirmed deaths and recoveries’, and ’death toll rising (China and Italy)’.
News and Fake News: 5 topics related to the ‘5G conspiracy theory’, the ‘misinformation spread in social media’, and the fact that ‘US President claims disinfectants can sure’.
Preventive measures: 3 topics addressing the ‘facemasks’, the ‘social distancing’, and the ‘hand washing’.
The relative popularity of each topic category (in descending order) is shown in Fig. 2. The entire list of topics is presented in“ Appendix B” along with the percentage of the overall topic contribution used to calculate the topic popularity rank and the top 15 significant words defining each topic.
The ten most popular topics for the over the entire time span of the study are presented as word clouds in Fig. 3. The first and third most popular topics are ‘expression of extreme sentiment (anger & fear)’ (3.32%) and ‘expression of extreme sentiment (fear of dying)’ (2.69%). Second most popular is the topic ‘USA President response’ (2.82%) and in the tenth position is the topic ‘information and guidelines updates’ (1.46%). There are 2 topics related to cases and deaths (‘death toll rising (China and Italy)’ and ‘number of cases and deaths’), another 2 topics from the life during the pandemic category (‘reading and writing in quarantine’ and ‘quarantine time eating and activities’) and the topics ‘fears for impact on stock market’ and ‘ban of flights to/from China’ from the economy and lockdown category, respectively.
Topics popularity evolution
Linear regression analysis showed a significant linear fit (R2 > 80% and p value ≤ 0.05) for 15 topics; 13 topics showed a positive trend and 2 topics showed negative trend. The top three topics with the higher positive trend are: (a) ‘impact on business and companies’, (b) ‘support and donations’, and (c) ‘mental health during the pandemic’. The two topics with the negative trend are: (a) ‘disease spreading’ and (b) ‘live data maps’. Topics with significant linear fit are appropriately marked on Table 2 in “Appendix B”.
The following diagrams present plots of topics’ popularity per week for all topics for the 15-week span covered by this study, organized per category.
Figure 4 shows topics related to Life during the Pandemic category. The most popular topics ‘expression of extreme sentiment (anger & fear)’ and ‘expression of extreme sentiment (fear of dying)’ show an initial decreasing trend, followed by a clear peak on 11th week (Fig. 4a). Additionally, the topic ‘stay home safe’ peaks at 13th week (Fig. 4b). Most of the least popular topics (Fig. 4c) show brief peaks; exemptions are the increasing trend of the topic related to ‘art in quarantine’ and the decreasing trend of the topic “musical bands and groups”.
Weekly variation of popularity for topics in the Pandemic Management category is shown in Fig. 5. Most popular topics (Fig. 5a) of this category are related to specific events or announcements and show peaks in specific weeks. One the other hand, topics related to various pandemic management strategies and instruments (Fig. 5b,c) show a linear fit with increasing trend.
Weekly variation of popularity for topics in the Medical category are shown in Fig. 6. The topic on ‘world health emergency declaration’ shows repeated consecutive peaks of a decreasing height for the duration of the study (Fig. 6a). A number of topics related to vaccines, treatment, and medical tools and supplies (Fig. 6b) present a variability in popularity for the time span of the study, with an overall increasing trend.
Figure 7 shows topics related to Outbreak category. The most popular topic is about ‘China and Wuhan outbreak’ with a strong prevalence up to the 8th week, while the topic ‘Diamond Princess cruise ship outbreak’ predominates during 6th to 8th week (Fig. 7a). Among the less popular topics (Fig. 7b), the ones related to specific events, e.g., the topic ‘cases in CPAC 2020 conference’ and ‘UK Prime Minister infection’ show a single clear peak on 11th week and 15th week, respectively. County or region-specific outbreaks follow trend lines with recurrent peaks.
The topics related to Lockdown category are shown in Fig. 8. The topic ‘ban of flights to/from China’ shows a high peak on the 5th week, followed by a quick decline. Topics related to schools lockdown and events cancelation appear to be of equivalent popularity (Fig. 8a) and overall of more importance than the country or region-specific lockdown shown in Fig. 8b.
Topics related to Economy category are shown in Fig. 9. Topics about ‘impact on supply chain (due to China lockdown)’ and ‘fears for impact in stock market’ peak on 8th and 9th week, respectively (Fig. 9a). The topic ‘lockdown and economy restart’, albeit of a low popularity, shows a clearly increasing trend (Fig. 9b).
Figure 10 shows topics of the Cases and Deaths category. The topic ‘number of cases and deaths’ shows a peak on 7th week and the topic ‘death toll rising (China and Italy)’ shows a peak on 10th week (Fig. 10a). Only the topic on ‘confirmed deaths and recoveries’ shows an increasing trend (Fig. 10b).
Topics related on News and Fake News category are shown in Fig. 11. The topics on ‘misinformation spread in social media’ and ‘news channels updates’ are the most popular of this category show a slowly decreasing trend, while the topic on ‘US President claims disinfectants can cure’ shows a sharp peak on 17th week.
Figure 12 shows topics related to Preventive Measures category. The only topic that reaches even for a week the 2% of the topic contribution is ‘hand washing’ showing a peak on 10th week, while the ‘facemasks’ and the ‘social distancing’ topics remain of comparatively low popularity.
Processing for geographic origin showed that tweets in the final corpus originated from 248 countries, while country of origin could not be determined for 6,822,526 tweets (33.7% of the tweets included in the final corpus). Figure 13 shows the tweet contribution of the 20 most contributing countries.
The 10 top most popular topics for the four countries with the highest number of tweets in the corpus are shown in Fig. 14. The most popular topic in each of these countries is different and, as expected, of regional interest. Topics related to expression of extreme sentiments, either anger and fear or of fear of dying, appear in the top 10 most popular list in all these countries. However, considering countries with more than 10,000 tweets, Greece was that the country with the highest expression of topic ‘expression of extreme sentiment (anger & fear)’ (11.3% of overall tweet corpus of this country), while Hungary was the country with the highest expression of topic ‘expression of extreme sentiment (fear of dying)’ (38.9% of overall tweet corpus for this country).
Synthesis of results: Correlation with real-world events
Real-world events of the period of the study were used as points in the weekly popularity timeline to identify correlations and provide evidence of the efficiency and effectiveness of the popularity evolution analysis and geographical distribution followed in the study.
Weekly variation of popularity for topics in the Lockdown category is shown in Fig. 15, together with related real-world events. On 27 February, Prime Minister Shinzo Abe requested that all Japanese elementary, junior high, and high schools close until early April to help curb the virus . This coincides with the 9th week peak of the topic ‘European and Japan. The topic ‘Nigeria lockdown’ shows also a first peak on 9th week; at the same time, the first confirmed case in Nigeria was announced (27 February, ). The same topics peak again during the 13th week and then follows an increasing trend; this is corroborated by the fact that on March 23th Ebonyi State government in southeastern Nigeria, banned all public gatherings , while at the same time, Kwara State and Lagos State announced the indefinite closure of their schools [36, 37]. On 24 March, the Government of India under Prime Minister ordered a nationwide lockdown , and this is depicted on the respective topic on ‘India lockdown’ which sharply increases popularity during the 2 previous weeks.
Figure 16 shows USA representative topics and events. Clearly, the topic ‘warning by Dr. Fauci’ peaks on the week of Dr. Fauci’s announcement at a business show on CNBC that “good public health has limited outbreak in the US”  and is briefly discussed for the following 2 weeks. The topics ‘US White House task force’ and ‘USA President response’ peak sharply on the 9th week, as the President Trump, Vice President Pence, and members of the coronavirus task force gave a press conference on February 26th . The topic ‘tax relief and relief funds in USA’ shows a peak on 13th week, coinciding with the U.S. President signing the Coronavirus Aid, Relief, and Economic Security Act on March 27th . Τhe topic on ‘USA primary elections (Wisconsin)’ shows a low but continuous popularity from the 8th till the 16th week and peaks during the week of the elections . Finally, the topic ‘USA protests’ is spawned by the USA protest on 15 April , while the topic ‘US president claims disinfectants can cure’ is briefly popular during the week of the respective announcement .
Figure 17 shows representative events of Outbreak category. The topic on ‘Diamond Princess cruise ship outbreak’ shows an increase in popularity during the 6th week, and peaks during the 8th week when passengers were evacuated after a 2 week period of continual announcements of new positively tested passengers . On 2 March, Saudi Arabia confirms its first case, a Saudi national returning from Iran via Bahrain  and this seems to spawn a 1 week earlier peak for the topic on the neighboring ‘Arabic countries outbreak’. On 27 March, Britain’s Prime Minister Boris Johnson tested positive , subsequently admitted in the hospital and discharged on 12 April  which coincides with the popularity of the ‘UK Prime Minister infection’ peaking on 13th and 15th week.
The time evolution of the popularity of the topic ‘toilet paper and other supplies panic-buying’ is shown in Fig. 18. The 5 countries where this particular topic was most popular were Australia (2% of overall tweet corpus of this country), Hong Kong (1.7%), New Zealand (1.5%), Ukraine (1.4%), and United Kingdom (1.3%) (considering countries with more than 10,000 tweets), so related events or announcements from these countries are also depicted in Fig. 18. The popularity of the topic starts increasing during 8th week, when armed robbers stole hundreds of toilet trolls in Hong Kong . The increase continues next week, as Australian toilet paper company announced that it had completely sold out of stock on 26 February , and a supermarket rush evolves along the first confirmed case in New Zealand . The popularity of this topic peaks during 10th week, when on March 6th, the British Health Secretary has urged the public to stop panic-buying ; a slow decline thereafter is followed with the Ukrainian announcement of quotas for shoppers .