Since it emerged as a global threat in early 2020, the COVID-19 pandemic has affected health, human functioning and society on an unprecedented scale. The global spread of the virus in the absence of vaccines and effective treatments demonstrates the importance of effectively using non-pharmaceutical intervention (NPI) such as social distancing to reduce transmission of the virus, limit mortality and avoid overwhelming local healthcare systems [1]. Two strategies were used in most nations: quarantine of infected persons and social distancing to mitigate the spread of the virus [2,3,4,5]. Effective implementation of containment and social distancing strategies requires social trust, given the threat of massive disruption to society and the economy [6].

In response to the rapid spread of COVID-19, many nations mandated all but essential businesses to be shuttered and for individuals to “shelter in place” to reduce the risk of transmission of the highly contagious virus. In Italy, as one of the first countries to be severely hit by the wave, the “#I-stay-home” campaign obliged citizens to avoid leaving their homes. This effort and similar programs in other nations require trust and public consensus, to engage a nation’s citizens as active co-participants in their own and their fellow citizen’s health and well-being [7].

At the time of submission, almost 3 million cases of infection and nearly 100,000 COVID-19 related deaths had occurred in Italy. The effectiveness of measures such as social distancing to reduce the spread of the virus depends on the level of social trust and collection societal action that is supported by integration among the key groups such as citizens, institutions, information providers and elected officials [8]. Artificial dichotomies between the need to contain the spread of the virus and the need to maintain the health of the economy, conflicting themes in public and social media, and lack of a unified message can undermine the citizen buy-in, social trust, public compliance, and the speed and effectiveness of implementation.

Social trust and precise messaging are key in the current efforts to address an unprecedented challenge to the healthcare systems of nations. They are needed to inform public perceptions and contribute to a developing regional or national consensus that helps leaders and policymakers to coordinate transparent and consensus-based efforts to adopt of country-wide social distancing measures such as closing schools, banning mass gatherings, and isolating individuals with the virus and their contacts. These efforts were shown to be effective in containing the spread of the Spanish Flu in 1918 [9].

In this paper, we explore the content and messages in social media communications during the early stages of the spread of the COVID-19 virus in Italy, which numbers are reported in Appendix 1 (Table 4). The aim is to better understand how social media dialogue can affect and be used strategically in the adoption of large-scale regional and national social distancing measures to prevent the spread of the virus.

Literature review


The World Health Organization Influenza Pandemic Plan of 1999 puts considerable attention on the role of non-pharmaceutical public health interventions to contain or delay the spread of a new influenza virus [10]. NPI include early case isolation, social distancing using face masks, closing of schools and businesses shot-down [10]. The application of NPI proved to reduce the spread of the COVID-19 virus in several areas inside China [11,12,13,14]. However, to be effective, NPI requires authorities to agree in advance on a range of containment strategies, the population be informed and willing to adopt the necessary measures [10].

Analyzing the NPI applied during the influenza pandemic of 1918 Whitelaw [15] wrote: “To sum up, it is evident, that no public health law, which has not the endorsation and support of the public generally, can ever be reasonably well enforced.” More recently, the WHO [16] wrote: “Some of the lessons learned from the 2003 severe acute respiratory syndrome (SARS) epidemic can be applied to influenza, including the success of public campaigns to encourage self-recognition of illness, telephone hotlines providing medical advice, and early isolation when potential patients seek health care.” Several variables have proved necessary to get public endorsement for the application of NPI such as the perceived risk, severity of the consequences as well as response efficacy of the adopted measures [17, 18]. Therefore, while NPI has proved to be effective in limiting the spread of a pandemic, there must be a public endorsement of their employability. Giving people the right information is essential to empower them to evaluate their risks and the importance of curtailing their freedoms in terms of virus spread limitation.

Emergency management and social media communication

The development of social media has changed the communication both in terms of information availability and flow. Collaborative generation and dissemination activities of several types of content are some of the most critical distinct features of social media. According to Brynielsson et al. [19] “Within the field of crisis communication, social media possibilities such as online sharing and social networking have had an impact on the way crisis information is disseminated and updated.” Among the many social media, Twitter has been widely used in emergency management literature due to its specific features. For example, Twitter allows to post comments visible to all audience but also directly targeting a specific audience due to the mention and reply function [20]. The hashtag feature might help support the rapid building of an issue around specific community problems or geographical areas [21].

Research on emergency management shows that Twitter has been used to improve situational awareness among communities [22]. It can inform local communities given emergency alerts [19], and can act as a tool to facilitate social and political trends for change during emergencies when emotions embolden people [23]. Despite its great potentiality, due to the unchecked and socially constructed nature, messages shared on Twitter might lead to disinformation contributing to the infodemic problem [24]. For example, Panagiotopoulos et al. [25] discuss the social amplification or reduction of risks that on the one hand might be caused due to the Twitter flow and that on the other hand could be monitored by those responsible for risk management. Similarly, Surian et al. [26] used Twitter discussions about human papillomavirus vaccines for clustering opinions and detecting risks for public health.

The COVID-19 emergency and the Infodemic

The COVID-19 is a global emergency “which started in Wuhan in China in early December 2019, brought into the notice of the authorities in late December, early January 2020, and, after investigation, was declared as an emergency in the third week of January 2020” [27]. At the time we are writing the COVID-19 has killed almost 2.5 million people worldwide. However, just a few months earlier, the nature and danger of the virus were hotly contested. The US Surgeon General, Jerome Adams tweeted on February 1st, 2020 “Roses are red, violets are blue, risk is low for #coronarvirus, but high for the #flu” [28]. On March 9th, 2020 the US President Donald Trump tweeted: “Last year 37,000 Americans died from the common Flu. Nothing is shut down, life and the economy go on... Think about that” [28]. When it became clear that the situation was much worse, and commenting on his previous statements on Twitter he later said: “circumstances change but it was a true statement at the time it was made” [28]. Therefore, the COVID-19 emergency differs from other emergencies as knowledge of the real risks was mainly unknown or at least debated at the early stages of development of the pandemic.

The development of the COVID-19 pandemic demonstrates the spread of fake news, false information based on non-checked facts [29]. In March 2020, a pool developed by YouGov and the Economist revealed that 13% of Americans believed the COVID-19 crisis a hoax, while even world leaders’ social media posts had to be deleted for spreading misinformation about the Coronavirus [29, 30]. The development of false and unchecked information, recently named infodemic [24] during the COVID-19 emergency is peculiar compared to other crisis. The limited scientific knowledge available and the lack of developing consensus among the population increased the initial spread of the virus due to the specific nature of the NPI required.

Twitter has proved to be “the dominant social reporting tool to spread information on social crises” [31]. Previous studies employing crisis and emergency risk communication models are based on the monitoring of the risks and the communication of warnings [32] to avoid social amplification of the risks [33]. However, the COVID-19 emergency represents a new context, where little knowledge was available at the beginning of the crisis on the real dangers. Understanding how communications flow on Twitter, shaping the community understanding of the risks in a situation where there is little or debatable knowledge on the dangers appears, therefore, central.


Our analysis included three steps. First, we explored the main topics in messages by five groups with regular twitter communication and sizable numbers of followers: institutions, news sources, elected officials, scientists and social media influencers using topic modelling methods. Second, we used social network analysis to assess the size and reach of social networks and identify boundary spanning opportunities (sources and messages that span social networks) [34]. Third, we conducted a chi-square trend analysis that analyzed the impact of the mounting crisis on the themes in social media message.

Data collection

We downloaded tweets posted on the topic of COVID-19 infection in Italy from February 11th to March 10th, 2020. A tweet is an online posting created by a Twitter user limited to 280 characters or less. Once published, the tweet will appear on the Twitter home pages of all users who follow the induvial who released the message. Users might retweet messages, amplifying selected and extending the spread of certain discussions. Twitter is the most heavily used micro-blogging platform in the world and provides access to its data. Although Twitter represents only a part of available social media, a number of studies have used Twitter data, with studies showing it is a reasonable proxy and representation of political, social and scientific opinions [35, 36].

We selected tweets based on their contents using both keywords and the hashtags: virus, Coronavirus, and COVID-19. Other keywords, such as, for example, SARS-CoV-2, were excluded since the tweets mentioning those words were few and also reporting the word “virus”. We received messages tweeted in Italian from the Twitter company and focused on the top retweeted messages, using an inclusion criterion that included more than 50% of total retweeted messages and ignored messages that did not attract attention from users. We only used the number of retweets as a metrics of virality because, since our interest was about examining the infodemic phenomenon, we were interested in the diffusion of the messages, instead of considering the users’ reactions (e.g., likes, feelings, comments and replies).

Data analysis

We analyzed the content in the data using Python (Python Software Foundation) and its topic modelling function to detect the main topics discussed in the messages using a computer-aided content analysis [37]. Content analysis provides a useful and multifaceted, methodological framework for Twitter analysis and supports the structuring of textual data by enabling categorizing and coding [38]. Within content analysis, topic modelling is a type of statistical modelling for discovering abstract “topics” that occur in a collection of documents or as in our case tweets. Latent Dirichlet Allocation (LDA) approach was used to classify and code text into particular topics [39].

The original list obtained from the statistical analysis was then manually coded by the authors (MM, PT, and MLT). The emerging codes were circulated among the researchers, and the list of codes was included in a codebook. Several conference calls/meetings were held to fine-tune the codebook and to group codes that related to the same phenomena. We further analyzed the data until conceptual saturation was reached and no new codes or categories were generated or merged together [40]. In addition, we manually coded the most retweeted messages by senders using the description provided by the users themselves in the presentation of their account using open coding [41]. In some cases, when the account’s presentation was not enough to define a sender, we searched his/her profession or role using the web. This coding approach means that we created new codes according to the senders’ descriptions of their accounts, so creating categories reflecting the concepts about the types of actors. We iterated the aggregation and creation of codes until reaching a conceptual saturation with significant categories of actors. Therefore, we aggregated senders of tweets into five distinct categories: Institutions (e.g., messages from the government or the Italian NHI), News sources(e.g., messages from TV channels or journalists), Politicians (e.g., messages from personal accounts of politicians or political parties), Science Sources (e.g., messages from scientists), and Influencers (i.e. all the other influencing users, including V.I.P.’s, celebrities, and private users who accounted for a large number of retweets, using a cut-off point of 1400 retweets).

We employed a chi-square test of independence with standardized residuals to search for similarities and differences in topics discussed by source (e.g. topics mainly discussed by institutions, politicians, etc.) using R software [42].

As a second step, we analyzed the development of discussions and messages over time. We chose three periods: a) before February 24th; from Feb 25th to March 1st, when the number of infected individuals exceeded 200, and few regions of the country had implemented social distancing; and c) between March 1st and March 10th when the entire country was in lock-down. The numbers related to the daily spread of the disease are reported in Appendix 1 (Table 4). The social network map using the ForceAtlas2 algorithm [43] was produced using the software Gephi, open-source software for graph and network analysis that measures the relationships and flows between people, groups, or organizations [44, 45]. The layout provided by the software supported the grouping and alignment of nodes connected together and helped to determine the current community state of social networks and to identify boundary spanning opportunities.

As a third step, a chi-square trend analysis was employed to search for linear trends between the COVID-19 crisis and the number of retweets from each source (i.e., influencers, institutions, news, politicians, scientific sources) for each topic and the total number of retweets were analyzed and compared to available COVID-19 morbidity and mortality data.


Topic analysis

Our data encompassed 74,306 messages that were retweeted more than 1.2 million times from a total number of 2.3 million assessed retweets. The data analysis revealed 14 major themes that were intensely discussed by the five groups. Table 1 reports the main topics discussed, with examples of each type of message, the number of retweets and the keywords used in the classification process. The chi-square analysis revealed significant differences between the topics discussed by each group (χ2=8437.5, df = 52, p < 0.001). We produced a double-entry table (topics of rows and actors on columns) and compared the actual results with expected results from the chi-square analysis. The differences between the expected versus the actual results were then divided by the square root of the variance function/expected value to obtain the Pearson’s residuals.

Table 1 Themes, Categories, Codes, and Quotes from Most Retweeted Messages

The topic analysis was developed using the function chisq in R. The results are shown in Fig. 1. Positive residuals are coloured in blue, defining an attraction between the corresponding rows (topics) and the column (actors). Negative residuals are coloured in red showing repulsion (negative association) between the corresponding row and column variables. The results show that influencers had higher standardized residuals, suggesting a higher than expected number of messages for the specific actor for messages that spoke to fear of foreigners and blamed immigrants in Italy for starting the COVID-19 outbreak. Politicians had higher residuals (suggesting higher than expected numbers of messages) for messages connected with managing the economic fallout, and to support citizens and businesses and hospital during the crisis. Not surprisingly, infection risks and rates and epidemiological information commonly originated from Scientific sources. In contrast, News sources were mainly concerned with the closing of entertainment, restaurants, schools and universities, identifying early cases of infections and highlighting the slowdown in the economy. The Institutional sources had a higher propensity for information and guidance for directing the behaviour of citizen.

Fig. 1
figure 1

Topic Analysis by Actor Type

Actor type relevance

The findings for the Social Network Analysis suggested a prevalence of specific messages during the three periods. (Fig. 2) During the first time-period, messaging was dominated by influencers with several prominent actors that attracted and guided the national discourse. The results demonstrated, however, that during the most critical days, February 19th and 20th, when the first Italians tested positive for COVID-19 [46], the average percentage of retweets for influencers fell from 55% to 25% of the daily total, with scientific sources rising from an average of 8% to 42–48% of total tweets. The second time-period shows that the news channels and broadcasters were receiving more attention, but influencers were still relevant and often undermined the scientific messaging. During the third time-period, scientific sources began to dominate the discussion, building public confidence with messaging flow that was topically congruent and connected to news sources with both emphasizing key messages for dealing with the pandemic.

Fig. 2
figure 2

A Social Network Graph Based on Tweets

Topic and actor type relevance during the crisis development

The chi-square trend analyses follow the three main periods previously discussed (Tables 2 and 3). During the three time-periods, the infection rates were increasing with the total number of cases moving from three cases on Feb 11th to 1694 cases on March 1st, and to 10,142 cases on March 10th. The results showed that some topics dropped dramatically in their trending as the crisis intensified. This includes tweets promoting anti-immigrant propaganda and fearmongering against foreigners. Other topics, particularly science-based and practical information, grew in their relative importance and urgency. The reduced messaging by influencers made room for a range of sources to contribute solutions and build confidence. This included scientific sources, politicians and institution, who collectively contributed to messaging that sought to build social trust and community activation.

Table 2 Chi Square Trend Analysis of the Key Topics in the Three Time Periods
Table 3 Chi Square Trend Analysis of the Key Actors in the Three Time Periods


This paper contributes to the significant body of literature examining the COVID-19 pandemic. Our results show the importance of social media in supporting a community-wide and ultimately nationally-coordinated effort to build public awareness and engagement during the COVID-19 pandemic. Analysis of social media themes highlights useful and damaging messages, including false claims that blamed COVID-19 on foreigners. Interestingly, several actors without a scientific background encouraged discussions about how best to prepare for COVID-19. In contrast, contentious and dissenting voices might slow the process to reach a fact-driven consensus, and even promote counterproductive actions such as downplaying the danger of public crowding.

Twitter and other social and digital means of communication have become essential channels for physicians and scientists to spreading health and public health information [47, 48]. Twitter proved to be a powerful knowledge translation tool to translate and transfer meaningful knowledge from healthcare authorities to the population, about what should be done or avoided [49, 50]. Twitter feed ultimately benefitted the global community during the pandemic by serving as a readily accessible and trusted source for reliable and science-based information [48]. COVID-19 also highlighted the danger of a serious infodemic [51], with an over-abundance of information with uncertain accuracy, making it difficult for individuals to select the sources for actionable information and guidance.

Political actions and actors impacted the coronavirus spread, by denying the COVID-19 realities, and promoted social interactions under the motto “let’s keep our habits, we can’t stop Milan and Italy.” These public actions effectively helped spread the virus, only to back-track days later as the number of affected COVID-19 people dramatically increased and the pandemic mortality emerged [52]. Some tweets contained messages promoting fear and falsely blaming foreigners’ for the illness and created a false promotion that underestimated the severe impact of COVID-19. These actions undermined social trust and preparedness, exacerbating a general sense of fear and panic across Italy as the pandemic was fast becoming a debilitating national emergency.

There are many lessons to be learned for other nations about the experience of social media messages in Italy. Paramount is the importance of clear, convincing, fact-based and actionable messaging, to overcome misinformation and to garner the trust of the public during a challenging time. A handful of countries, including Singapore, Taiwan, Germany, Iceland, have managed to stay on top of their outbreaks by adhering to radical transparency, promoting community activism, while aggressively testing to find cases, quarantining contacts, and keeping viral transmission from going into an exponential growth phase [53]. Taiwan’s early recognition of the crisis, daily briefings to the public, and simple health messaging allowed the government to reassure the public by delivering timely, accurate, and transparent information regarding the evolving epidemic [54]. Additionally, the lessons from China and Singapore show that when the NPI reached the target of limiting the Covid-19 spread, new emerging cases can require the reintroduction of containment measures. Timeliness in accurate public messaging using both smart technology and traditional press conferences by trusted leaders was crucial.

Our study has inherent limitations. While Twitter has become an essential platform for textual communication and information sharing, the tweets represent only a sample of people’s communication and human interactions. However, there are many studies using Twitter data that consider it a reasonable proxy of the user’s mental models [42, 55]. We realize that the tweets also exhibit specific characteristics of brevity, fluidity, and meaning embedded in a broader context. This can pose challenges for the researcher engaged in content analyses. Third, the systematic limitation to analyze twitter methods “statistically” may be appropriate but as of yet un-validated for analyzing twitter, key messages, key actors, and evolution over time. Fourth, the COVID-19 pandemic and its rapidity evolving social, psychological, economic, or geographically attributes mean that the analysis is accurate only in the context of the limited time frame being studied. The co-variates may undoubtedly alter the findings and statistical results to where they are applied. Further development of statistical procedures which can be validated and replicated is needed with this type of data. Fifth, our analysis focused only on tweets posted in Italy. The different spread of social networks among generations or genders in different cultures and countries might affect our results. Finally, we focused on a limited period of time. Extending our analyses likely could have led to different findings.


Societies need to respond quickly to pandemics to protect the health and well-being of their citizens. In this exploratory work, we developed a systematic method to analyze Twitter messages, understand key messages, key actors and their evolution over time. We showed that social media could be used effectively to respond to the pandemic through transparent and convincing messages, rooted in scientific knowledge, to help build confidence and improve the implementation effectiveness of policies, ending up as an effective knowledge translation tool to facilitate the communication with the population. Despite the infodemic, the threats from fake news, trolls and bots that automatically produce and share contents on social media, the scientific voice and other institutional sources of information were able to dominate and be spread among people over the acute phase of the outbreak, so gaining their trust and public engagement in facing the pandemic. Countries that are transparent on the state of their country and provide truthful health information for their citizens will likely gain public trust and more rapid NPI uptake and compliance. Finally, we believe that an area for future research entails examining how social media and other readily collected public data could be leveraged to improve methods for public messaging, assessing the spread of the virus and support appropriate public health actions. Twitter can be leveraged to improve population health preparedness, better and early public response and support public policy actions [56].