1 Introduction

Starting at the beginning of 2020, the world experienced the first major pandemic since the Spanish Flu after the First World War. This had a profound effect on the subsequent years, as it forced more or less the entire global population to deal with restrictions that most people had never experienced before directly, and possibly only read about in history books or watched in fictionalized form in historical movies. But aside from bringing back public health policies and scenes that had been unseen in Europe (and more generally in the Western world) for over a century, at least at such scale and intensity, the COVID-19 pandemic had another, and possibly more important, point of interest. Indeed, the coronavirus emergency was the first time in history that a pandemic occurred in an information-intense and hyper-connected society, where information travels at uncanny speed throughout the world, and a large share of the population can observe in real time what is happening thousands of kilometers away.

This makes the COVID-19 pandemic a unique moment in the history of mankind, since for the very first time authorities had to fight not only the spread of a virus but also the uncontrolled spread of news and the consequent misinformation that perhaps caused more damage than COVID-19 itself (Ferrante et al. 2023; Lisboa et al. 2023; Soto-Vega et al. 2024). Some authors talk in this regard about an “infodemic”, which, according to the World Health Organisation, is “too much information including false or misleading information in digital and physical environments during a disease outbreak [which] causes confusion and risk-taking behaviours that can harm health [and which] also leads to mistrust in health authorities and undermines the public health response”.Footnote 1

This is in some ways another example of one of the major paradoxes of our times, the extraordinary abundance – the “excess” of goods (in latu sensu) – which may create harm rather than bring benefits. As well as for food, energy, and many other cases, where the effects of this paradox are there for all to see, the same also seems to be in place for information. Indeed, with the increased ease in spreading information that comes with technological advancement, together with many benefits that should not be overlooked (such as the reduction of transaction costs and information asymmetries), it also becomes easier to produce fake news and conspiracy theories, and spread them.

Some authors have contested the definition of the term “fake news”, proposing that it be replaced with “disinformation” (Pérez 2019), believing that the latter better captures the range of false information online, from propaganda to hoaxes to manipulated media, while claiming that the term “fake news” has been politicized and is itself misleading (Pérez 2019). We believe that this position highlights the nature of fake news as intentional hoaxes, since the main difference between disinformation and misinformation is that the first idea implies the voluntary spread of information the sender knows to be wrong. In other words, the former concept implies the specific intent to mislead, while the latter does not.Footnote 2 Understanding this distinction is crucial when addressing the challenges posed by inaccurate information in our interconnected world. In this study we consider the spread of fake news as misinformation, since we are skeptical of the claim that people spreading false information online only do so on purpose, rather than because they actually believe in the incorrect information they spread. Furthermore, it seems clear that these days this type of information proliferates at an impressive rate, and blends itself effectively with legitimate news, making it sometimes hard, at least for laymen, to separate the sheep from the goats.

The main factor that has enabled the rapid spread of information (whether verified or not), and consequently of fake news, is the rise of social media and online platforms (Harriss and Raymer 2017), which has been suggested as impacting the demand for healthcare (Amaral-Garcia et al. 2022). The sources, means of generation, and types of fake news are diverse, as are the motivations that underlie it (Narwal 2018). Fake news can be created for profit, or out of ideology, or out of pure mischief (Dennis et al. 2021); nonetheless, it is often financially or politically motivated. It has been suggested that fake news spreads faster and further than the truth (Dennis et al. 2021). Some studies suggest that fake news tends to spread differently from real news, through self-loops, where users repost their own shares (Zhao, 2021), showing once again the instrumental role played by the Internet in propagating this form of news. Having said that, the phenomenon soon abandons the online realm, and has significant social impact in the real world as well as the virtual world, influencing major events like political elections, and causing real-world harm (Dennis et al. 2021). This mechanism has been further highlighted by Ecker et al. (2022), who delve into the multifaceted landscape of misinformation with their Information Deficit Model, highlighting how correcting misinformation does not always lead to complete revision, and highlighting important concerns about fairness, equity, and trust. Their insights have implications for public health and policy-making as well. There are hence reasons to believe that the diffusion of the Internet, as the main habitat of this misinformation, may also play a role in the case of a pandemic, such as COVID-19, contributing to bad habits that increase the contagion rates.

All these pieces create an interesting puzzle for scholars interested in the intertwining of and interconnections between the pandemic and “infodemic” dynamics. More precisely, one might ask: does greater access to the Internet contribute to solving the health crisis, or, on the contrary, does it worsen it? And does the freedom of the Internet play a role in this dynamic? Indeed, on the one hand, it is possible to believe that countries where citizens have greater access to information through online technology, with free media and broadcasters, are more truthful in showing the evolution of COVID-19, especially in terms of cases and deaths, and hence are better at creating informed citizens and implementing more efficient and responsive non-pharmaceutical interventions (NPIs). Indeed, given that their scale of implementation makes it impossible for the police to enforce them effectively, for these personal distancing policies to have an effect they require voluntary uptake among the public, especially in the Western world where a certain degree of personal liberty is considered to be inviolable by public authorities (Alfano 2022a and b). Following Besley and Dray (2020), one may argue that a free and independent media allows citizens to be better informed about the pandemic, contributes to reducing misinformation, and makes governments more accountable, all arguments that suggest that more freedom of information contributes to fighting the pandemic. It has also been suggested that the reason initial studies reported that autocracies were more efficient in dealing with the virus, when compared with the performance of democracies, was due to such authoritarian governments’ control and manipulation of reported data, and hence their countries’ lack of freedom in the media (Badman et al. 2021; Chen 2020; Laishram and Kumar 2021).

On the other hand, it is important to recognize that together with older forms of media, in recent years the Internet has offered a platform to anyone with a connection. This has led to the creation of alternative media, and the border between information and entertainment, already eroded by the development (Brants 1998) and later commercial success of infotainment (Thussu 2007), has become even more blurred. This is possibly the natural continuation of a long-term crisis of public communication (Blummer and Gurevitch, 1995), where horizontal broadcasting, from many to many, makes it difficult for the receiver to assess the credibility and authority of the source, and distinguish between senders of varying quality. As a result, the literature has already suggested that electronic media has played an important role in the spread of information regarding COVID-19 (Mughal et al. 2022). Some studies (Bridgman et al. 2020; Lee et al. 2020) have highlighted how social media platforms (like Twitter) contain more COVID-19 misinformation than traditional news media, and that exposure to social media is associated with believing in COVID-19 myths, while on the other hand consuming news media is linked to more accurate knowledge of the virus. All of this has led scholars to question the role the Internet has played in the recent COVID-19 pandemic. Did the availability of Internet access among the public, and different degrees of governmental control on what was published, affect the unfolding of the epidemic?

In other words, summarizing the arguments sketched so far, we might ask: which of these opposite mechanisms had the upper hand? Did greater access to the Internet among the public reduce the spread of the COVID-19 virus by creating more informed citizens who had knowledge about the NPIs in place that needed to be respected, the precautions to be taken to avoid unnecessary risks, and the real danger of the emergency? Or, on the contrary, did having greater access to the Internet, and possibly more freedom within it, contribute to the spread of misinformation and fake news, hence creating misinformed citizens, and in this way providing the perfect habitat for the virus to spread, thus leading to worse public health?

The objective of this study is to address these research questions through a quantitative framework and by exploiting a cross-national setting, with the use of macro-level data. It could be argued that it would be optimal to use individual-level data, which would furnish insights into both inclinations toward Internet use and compliance with NPIs. Regrettably, to the best of our knowledge, access to such data remains elusive, particularly in an adequately large and globally comprehensive sample, which would offer a considerable level of external validity, conditions that a small survey would hardly satisfy. The alternative course of action, which we adopt in this study, involves resorting to macro-level data, entailing the analysis of the trajectory of the spread of COVID-19 within nations characterized by varying degrees of Internet use among their populations, and varying degrees of Internet freedom. While it is important to highlight that this approach has certain merits regarding usable data volume and inclusiveness of different countries all over the world, meaning that in consequence the results obtained in such a heterogeneous sample have a greater degree of generalization, it is not exempt from certain limitations (most notably the potential for the ecological fallacy, a matter that will be discussed in the conclusion).

The rest of the study is organized as follows. After this introduction, the next section discusses the background literature and states the research questions, while section three presents the data and the methodology adopted in the empirical analysis. Section four presents the results, while the last section, as usual, concludes.

2 Background and Research Questions

Previous research on the impact of Internet access on the spread of COVID-19 presents mixed results. Some studies suggest greater Internet access can help reduce the spread of the virus. What are the mechanisms that could lead to this outcome? One argument is that proposed by Barna (2020), who argues that Internet access is crucial for people to stay connected while sheltering in one place, implying that Internet availability could help to ensure stay-at-home orders are complied with, and hence more effective in halting the diffusion of the virus (Alfano and Ercolano 2020). Feldmann et al. (2021) work along the same lines, showing in their research that increased Internet traffic and connectivity during lockdowns allowed many people to work, learn, and socialize from home, following the recommended safety protocols and thus contributing to reducing the spread of the virus.

Another idea present in the literature is that technology can constitute an effective tool that helps authorities to trace and communicate with citizens, hence giving institutions a faster and more effective answer to the pandemic. In this line of research, the work of Verma and Mishra (2020) is especially interesting. They explain how mobile phone apps have provided guidelines and safety tips to help curb transmission. Similarly, Singh et al. (2021) provide a review of how new technologies (such as AI, drones, and Internet of Things) can help with screening, contact tracing, and sanitation to combat the virus. Higgins and his coauthors (2020) show how one might obtain real-time data about the pandemic by leveraging databases such as Google Trends, proposing a digital epidemiology approach. Search terms for shortness of breath, anosmia, and other words or phrases related to COVID symptoms had strong correlations to both new daily confirmed cases and deaths from COVID-19. This is a finding also suggested by the research of Bento et al. (2020), and by the retrospective study of Li et al. (2020), who all found it useful to harvest data from Internet research to obtain a real-time snapshot of the epidemiological situation. Also, Li and Liu (2020) found that social media was an effective tool for improving health literacy and promoting behaviors that would prevent the spread of COVID-19 among the public.

However, other research indicates that Internet access may have limited benefits against the virus, or even have a detrimental effect, increasing the spread of the contagion. Multiple studies have suggested that social media fueled the spread of COVID-19 misinformation (Bridgman et al. 2020; Cuan-Baltazar et al. 2020; Himelein-Wachowiak et al. 2021; Lee et al. 2020; Tasnim et al. 2020). This infodemic of false information undermined public health efforts to control the pandemic. Pius and his coauthors (2020) note that misinformation on social media hampered prevention efforts in some developing countries. Shirish et al. (2021) showed that greater mobile connectivity led to higher rates of fake news about COVID-19, allowing misinformation to spread. Social media platforms like Twitter contained more COVID-19 misinformation than traditional news media, and exposure to social media was associated with believing in COVID-19 myths and hoaxes, while consuming news media was linked to more accurate knowledge about the virus (Bridgman et al. 2020; Lee et al. 2020). These misperceptions, in turn, predict less compliance with recommended safety measures like social distancing (Bridgman et al. 2020; Lee et al. 2020).

Automated social media accounts, also known as “bots”, were partly responsible for amplifying COVID-19 misinformation. Indeed, up to 66% of known bots spread messages about the pandemic, though the origins and intentions of these bots remain unclear (Himelein-Wachowiak et al. 2021). Low “digital health literacy”, or the ability to find and understand online health information, made people more vulnerable to COVID-19 misinformation (Bin Naeem and Boulos, 2021). Because misinformation spreads rapidly on social media, it requires a coordinated response combining multiple strategies to address it (Bin Naeem and Boulos, 2021). Approaches like using fact-checkers, promoting media literacy, and regulating technology companies might have helped slow the spread of COVID-19 myths (Cuan-Baltazar et al. 2020; Bin Naeem and Boulos, 2021; Tasnim et al. 2020). Another important dimension of this dynamic is the level of freedom on the Internet. Indeed, on the one hand governments’ actions during the pandemic have sometimes compromised media freedom and restricted access to information. For instance, it has been suggested that in China restrictions on Internet freedom were part of the official cover-up that let COVID-19 to spread globally.Footnote 3 On the other hand, it is important to underline that previous literature suggests that there is a relation between Internet freedom and the spread of fake news (Shirish et al. 2021). Balancing public health needs with fundamental rights remains a challenge during these unprecedented times.

Of course, while it may have facilitated some dynamic, either positively or negatively contributing to the spread of the virus, access to the Internet was not the only factor determining the spread of COVID-19. Salvador and her coauthors (2020) found that in 39 countries growth rates were higher in those with greater “relational mobility”, or tendencies to interact with new people. Also, Alfano suggested that social capital (2022a) and work ethics (2022b) may have been factors that affected compliance with NPIs. These studies, among others, indicate that cultural variability may also play a role that interacts with access to the Internet as a relational mediator.

In summary, while Internet access, by disseminating information and enabling distancing, was able to provide some benefits when it came to reducing the spread of COVID-19, the literature also suggests that Internet access may have had limitations or even unintentionally increased the spread of the virus, and greater Internet freedom and connectivity may have increased virus dissemination through misinformation and reluctance to follow recommended safety measures. Moreover, social media enabled the proliferation of COVID-19 misinformation, which is linked to riskier health behaviors and poorer knowledge about the virus. Bots and low digital literacy thus amplified this infodemic.

Hence, we may conclude that more research is needed to shed some light on the impact of Internet access and Internet freedom on the spread of a virus during a pandemic. This is important both to better understand the dynamics in place during the COVID-19 pandemic, in terms of the variable evolution of the disease in places that enforced very similar NPIs, but also, and possibly more importantly, to predict the effectiveness of NPIs among future publics with different degrees of access to the Web, taking the same variable into account. This is an important objective given that pandemics have been predicted to become increasingly common in the near future (Adamson et al. 2021; Hotez 2021; Simpson et al. 2020).

To avoid the introduction of biases to the study, we prefer to focus our empirical analysis on the first COVID-19 wave, where vaccines were not yet available, meaning that there were no differences in the evolution of the epidemic between countries due to different vaccination rates or access to vaccines. Moreover, at the time, many governments were inadequately prepared for the emergency, and hence had to react to the unexpected news without being able to rely on previous knowledge, something that is not true for the subsequent waves, when each country had a number of successful cases in recent history to take inspiration from. This implies that spillover effects, and copying from neighbors, are not biases that may hinder the estimation. To conclude, more formally we might state that this study aims to answer the following two research questions:

  1. RQ1:

    Did Countries in which a greater share of the population had access to the Internet see a greater spread of the COVID-19 virus during the first wave of the pandemic?

  2. RQ2:

    Did the interaction between the share of the population using the internet and the degree of internet freedom have any specific effect on the COVID-19 trend?

3 Methods and Data

In the context of the first wave of the pandemic, i.e. from January to August 2020, the main determinant influencing the evolution of COVID-19 infections is generally considered to have been the implementation of NPIs (Alfano and Ercolano 2020, Alfano 2022a and b). Indeed, given the absence of effective pharmaceutical treatments and vaccines, the foremost means of curtailing contagion involved the imposition of social distancing measures. The existing literature underlines the reliance of NPIs on voluntary compliance within the public domain to attain efficacy; indeed, due to the nature of such policies, which are aimed at the entire population, enforcement through coercion and policing force seems impossible (Alfano 2022a and b), and hence their effectiveness is based primarily on individuals’ spontaneous cooperation. Consequently, the empirical differences observed in COVID-19 dissemination rates across different countries, when adjusting for varying degrees of NPI stringency, may serve as an alternative marker for observing the public’s compliance with these directives (Alfano and Ercolano, 2022; Alfano 2022a and b).

It is important to underline that this assumption is not unquestionable. It relies on the trustworthiness of the available data, and the existing literature acknowledges substantial divergences in the quality of reported data across different nations (Lloyd-Sherlock et al., 2021; Vasudevan et al. 2021). The quality of these data has been shown to be correlated to the level of democracy within a country (Annaka 2021). Hence, in a cross-country analysis incorporating nations from diverse backgrounds, the inclusion of democracy levels as a control variable is important, in order to preclude the inadvertent introduction of external biases due to different regimes in the estimation.

As is well known, the absence of complete data poses a challenge in formulating models for tracking the propagation of COVID-19, and an even bigger one when trying to perform this task on a global scale. Previous research has put forth data-driven models as a viable strategy for approximating contagion patterns, and hence measuring the effective influence of NPIs in this dynamic (Alfano and Ercolano, 2022; Alfano 2022a and b). In this milieu, fixed effects models hold an advantage over random effects models when dealing with panel data analysis. As is well known, these models account for all level 2 attributes, whether observable or latent (Allison 2009; Halaby 2004; Wooldridge 2010). The literature emphasizes the importance of this advantage in devising an empirical strategy to model novel phenomena, where factors that remain static over the duration for which they are studied (such as healthcare system attributes, various habits among the population, smoking rates, and other variables that would be constants in a daily panel dataset that lasts the same calendar year) tend to exert an influence. Moreover, as of the time of writing, in the first quarter of 2024, not all determinants of the COVID-19 phenomenon are currently fully understood from a theoretical perspective. By employing fixed effects, the empirical estimation implicitly addresses variables omitted from the regression that remain invariant over time, potentially impacting the propagation of COVID-19.

At the same time, the benefits of fixed effects models present a notable drawback when the objective is to estimate the influence of a time-invariant variable, such as the operationalization of annual Internet users among the public (which, for evident reasons, is unavailable at daily-level granularity), within an empirical framework employing daily data: since it is constant over time across different countries, the estimation of such a variable is impossible. Prior studies (Alfano and Ercolano, 2022) have overcome this constraint by segmenting the sample into quantiles based on the variable of interest, and estimating its effect within discrete subsamples to subsequently compare coefficient magnitudes. This empirical approach has two principal limitations (Alfano 2022a): first, comparing estimated betas across disparate samples may yield inconsistent findings due to inherent biases and errors within distinct subsamples; second, partitioning the sample by quantiles of one variable also implies partitioning it by all variables strongly correlated with it, inducing uncertainty in the identification of effects, which possibly stems from omitted variables.

An alternative approach, previously adopted in the literature (Alfano 2022a and b), involves the integration of time-invariant variables into the analytical framework. This methodology, while retaining the advantages of fixed effects estimation for time-variant variables, obviates the need to segment the sample into quantiles. It relies on employing within and between effects that are estimated in random effects models (Allison 2009; Neuhaus and Kalbfleisch 1998; Rabe-Hesketh and Skrondal 2008; Raudenbush 1989; Wooldridge 2010), also known as hybrid models. As underscored by Schunck (2013), this empirical approach facilitates the inclusion of random slopes, permitting the estimation of effects associated with time-invariant variables that display variation across country clusters. Hence, following and augmenting the methodology previously used by Alfano (2022a and b), and Alfano and Ercolano (2020 and 2022), to assess the impact of NPIs on COVID-19 instances, a panel dataset was built, comprising daily data from a cohort of countries (a complete list is provided in Appendix 1). In more formal terms, the following Eq. (1) is estimated:

$$\eqalign{\varDelta {i}_{ct}&=\alpha +{\beta }_{1}\left({i}_{ct-1}-\stackrel{-}{{i}_{c}}\right)+{\beta }_{2} \stackrel{-}{{i}_{c}}+{\beta }_{3}\left({Str}_{ct-28}-\stackrel{-}{{Str}_{c}}\right)+{\beta }_{4}\stackrel{-}{{Str}_{c}}+\cr &\quad+{\beta }_{5}{Internet}_{c}+{\beta }_{6}{X}_{c}+{\beta }_{7}{T}_{t}+{\beta }_{8}{Cont}_{c}+{\beta }_{9}{{T}_{t}*Cont}_{c}+{\epsilon }_{ct}}$$
(1)

where the dependent variable \(\varDelta i\) represents new daily COVID-19 cases in country c at time t, relative to time t-1. It warrants emphasis that the adoption of daily counts of new COVID-19 cases as a metric to model the trajectory of the pandemic may entail certain limits. As previously acknowledged, we recognize the inherent limitations in this operational representation, influenced as it is by factors encompassing national testing protocols, test accuracy, latent asymptomatic instances, and possible manipulation of data by governmental entities. Notwithstanding these acknowledged limitations, we believe it remains the optimal approach among those currently available. Indeed, while we acknowledge the potential introduction of bias through this choice due to the factor we listed, we also assert that the alternative operationalizations conventionally utilized within the literature – notably, the count of COVID-19-associated fatalities or the assessment of 2020 deaths in excess relative to previous periods – are even less likely to yield an unbiased estimation. The former is susceptible to analogous issues and may inadvertently foster misrepresentation of case figures. The latter, though ostensibly a superior choice for measuring the effects of COVID-19, is accompanied by its own set of drawbacks. First, the accessibility of daily mortality data in a cross-country dataset, for both 2019 and 2020, is limited. Second, and more importantly, delineating which fatalities are directly ascribable to COVID-19 due to lack of NPI compliance, as opposed to deaths that would have ensued irrespective of it, due to advanced age or preexisting medical conditions, poses a formidable, and perhaps insurmountable, empirical challenge. Specifically, while it is logical to establish a correlation between instances of COVID-19 and recent instances of NPI non-compliance, forging a direct connection between COVID-19-related deaths and the infection event – a temporal span that may be considerably protracted – in addition to the degree of rigidity in imposed measures, entails a heightened degree of complexity.

Returning to the description of the model, the daily count of new COVID-19 cases, derived from Hale et al. (2020a, b), and transformed into terms of per million inhabitants (to have easier-to-read coefficients) by dividing the raw value by the population of the country (data from World Bank referred to 2019) and multiplying the result by 1,000,000, is expressed as a function of several factors. These variables on the right-hand side of the equation include:

  • The total infections in country c on the previous day (t-1), separated into a within-country component \(({i}_{ct-1}-\stackrel{-}{{i}_{c}})\) and a between-country component (the country mean, \(\stackrel{-}{{i}_{c}}\)), as is customary in these kinds of models. This variable is the cumulative (i.e. the summatory of daily cases) number of cases registered at time t-1 in country c, gathered once again from Hale et al. (2020a, b), and transformed into terms of per million inhabitants (again so as to have easier-to-read coefficients) by dividing the raw value by the population of the country (data from World Bank referred to 2019) and multiplying the result by 1,000,000.

  • An index that proxies the level of stringency measures. Following previous contributions (Alfano 2022a and b), this index reflects the rigor of the NPIs enforced in country c twenty-eight days before the time t. This is a necessary delay to capture the impact of NPIs on new COVID-19 case reporting, since of course these policies need some time to show results. This index is also operationalized thanks to Hale et al. (2020a, b), using their Oxford Stringency Index, a measure of the stringency level in countries all over the world. Similarly to what was done for the previous variable, this is decomposed into its within and between components. The existing literature indicates that for the alpha variant, which was dominant in 2020, approximately 97.5% of symptomatic individuals exhibited symptoms within 11.5 days of infection, with a 95% confidence interval ranging from 8.2 to 15.6 days (Lauer et al. 2020). Furthermore, previous studies have highlighted the existence of a “weekend effect” on COVID-19 cases (Soukhovolsky et al. 2021), which could be attributed to reduced testing and slower test processing during weekends, resulting in lower case numbers on weekends (Saturdays and Sundays), and Mondays. To address these concerns and mitigate bias in our estimations due to these effects, we opted, in line with previous contributions (Alfano 2022a and b), to introduce a lag of 28 days for the Str in the regression analysis. This ensures an equal number of days across the weeks, and hence eliminates the influence of the weekend effect. By doing so, we avoid measuring the impact of Str on individuals who have not exhibited symptoms after the implementation of NPIs. Thus, the value of Str in country c at t = 29 corresponds to the Oxford Stringency index value for country c on day t. This approach prevents any attribution of changes in the number of new cases (New cases pm) solely to NPIs, as insufficient time has elapsed for the NPIs to have had an impact on the spread of the contagion.

  • Time-invariant variables, labelled Internet in Eq. (1), encompassing, in distinct regressions, either the share of Internet users within the country (gathered from World Development Indicators, data referred to 2019 to avoid any look-behind effect, except for Libya, absent from the aforementioned source, for which data are gathered from TradingEconomics), or a matrix comprising the previous variable, an index of Internet freedom (Freedom on the Net score from Quality of Government data, referred once again to the year 2019), and their interaction (more details on this further below).

  • A set of five time-invariant control variables for each country c, incorporated to account for additional factors affecting contagion dynamics, consistent with prior research. This matrix includes, following previous literature on the theme (Alfano 2022; Alfano and Ercolano, 2022): a measure of wealth in the country, which has been suggested as being a determinant of COVID-19 spread (GDP per capita expressed in purchase power parity, gathered from World Development Indicators of the World Bank and referred to 2017, the last year for which data is available for many countries); the population density, which, as is well known, may affect the spread of a virus (data gathered from the same source previously mentioned); the share of the population in the country that is at least 65 years old, following the assumption that, even if Omori et al. (2020) show that susceptibility is not particularly associated with age cohort, as implied by Zhang et al. (2021), factors like mobility, compliance with restrictions, and social distancing, all of which affect COVID-19 diffusion, can be influenced by the age structure of a country’s population (data gathered once again from the same aforementioned source); a measure of state capacity in 2019, the State Capacity Comprehensive Index (O’Reilly and Murphy 2022), included to control for the capability of the state to implement NPIs; and an operationalization of democracy that Hadenius and Teorell (2007) suggest performs better both in terms of validity and reliability than its constituent parts (the fh_polity2 democracy index from Freedom House and Polity, gathered from the Quality of Government dataset from Dahlberg et al. 2019; to build this variable the average of Freedom House is transformed to a scale 0–10, and similarly the Polity is transformed to a scale 0–10: these variables are averaged into fh_polity2. For more details we would direct the reader to Dahlberg et al. 2019). Please note that this last variable might seem redundant in models where freedom of the Internet is included, since the two dimensions can be connected. On the other hand, we believe that the two concepts are different: democratic countries may have varying rules regulating a borderless Internet, whereas autocracies may not be very effective in restricting access and spread of information on the Internet, resulting in a very free Internet.

  • A matrix of dichotomous variables to control for fixed effects for each month included in the study (except January, which serves as the reference modality), to prevent temporal shifts and the pandemic’s evolution from introducing biases (denoted as T); and a second one composed of a continent’s fixed effects, for each country, to control for possible geographical effects (denoted as Cont), plus their interaction, to control for indirect effects of the interaction among these two dimensions.

This empirical approach allows for the estimation of the correlation between Internet diffusion among the population (represented by the coefficient \({\beta }_{5}\)) and COVID-19 transmission. It considers both the contagion trend in each region on day t, via the inclusion of the cumulative cases on the right-hand side of the equation, and the level of stringency in the NPIs implemented by each national government.

As already explained, to mitigate biases stemming from temporal variations in the spread of COVID-19 due to public preparedness, variant-specific incubation periods, and diverse national testing strategies, the analysis focuses on the pandemic’s initial phase, i.e. the first wave. This is also to avoid biases due to the differences in the spread of COVID-19 over time due to the preparedness of the public and the institutions available to face the virus, the varying incubation time among the different variants (Weng and Yi 2022), and the different testing strategies adopted at the national level by the different governments (and also the correspondent differences in the reports from which our data are gathered). This allows us to scrutinize the impact of Internet use on the verge of an unforeseen shock, rather than responses to an ongoing pandemic that may be affected by other determinants, and this difference is very important. Indeed, the pandemic affected both the number of Internet users and the freedom of the Internet (Shahbaz et al., 2020 and 2021); this also aligns with earlier studies emphasizing the first wave’s importance for governance assessment, considering governments’ relative unpreparedness and lack of information (Alfano 2022a and b; Alfano and Ercolano, 2022).

The final dataset is composed of 216 daily observations (for the 244 days from 1 January 2020 to 31 August 2020, excluding the 28 observations lost to lag the values of Str) in 60 countries (i.e. all those for which complete data are available; a complete list is presented in Appendix 1), giving a total of 12,960 observations. Descriptive statistics are presented in Table 1, while Fig. 1 presents heat maps with the most important variables for the countries included in the study, to briefly give an idea of the distribution.

Table 1 Descriptive statistics
Fig. 1
figure 1

Heat map of the main variable of the study. Source Author’s elaboration from data indicated in the article

4 Results

Estimations of Eq. (1) through a Feasible-Generalized Least Square (F-GLS) estimator, with standard errors clustered at the country level, are presented in Table 2, with the first column (2.1) dedicated to the share of Internet users in the population from the complete sample, and the second (2.2) from a smaller one, composed only of the Western countries present in the sample. More precisely, the first sample (2.1) is the most complete, including all the 60 countries that it was possible to include in the dataset (i.e. all those for which all data are available). The second is a subsample composed of only the following Western countries (2.2): Australia, Canada, France, Germany, Hungary, Italy, the United Kingdom, and the United States; this with the idea of testing the RQs in a more culturally homogeneous group of countries, and one with higher levels of democratic accountability, and hence possibly offering better COVID-19 data reporting (Annaka 2021). For obvious reasons regarding the reduced size of the sample, the estimations on the subsample do not include the matrix Cont, since this would have only one country for each continent.

Table 2 F-GLS Hybrid Model-Complete and Western subsample

Our results suggest that the total number of cases on the day before, i.e. the operationalization of the evolution of the pandemic, have in both their within and between components a positive and statistically significant effect on the number of new daily cases. This confirms once again the exponential nature of the pandemic. The lag of the stringency index also has a positive and statistically significant effect on new cases, a result already found by previous contributions (Alfano 2022c; Alfano et al. 2022), and confirmed in this setting too. On the other hand the coefficient of GDPpc shows a negative sign in the global sample, suggesting that richer countries had fewer daily COVID-19 cases, and a positive one in the Western subsample, suggesting that among these countries (the richer part of the complete dataset) the opposite is true. A possible explanation for this finding can be found in the fact that the richer Western countries, which were more connected, in terms of travel and trade, with the origin of the pandemic, paid a higher price because of that during the first wave. On the other hand, no clear effect emerged from the analysis of the coefficients of population density, share of the population over 65 years old, and state capacity. Moreover, looking at the Polity Index, our measure of democracy, there is a positive effect in the global sample, while within the Western subsample higher democracy corresponds to a lower number of cases. This may be due to the possibility that autocracies tampered with the number of reported cases, as hypothesized by a part of the literature (Chen 2020; Badman et al. 2021).

Coming to the variable of greatest interest for the present work, the number of Internet users is positively related in a statistically (very) significant way to the number of daily new cases, in both the complete sample and in the Western subsample (2.1 and 2.2). This first finding suggests that higher diffusion of the Internet corresponds to a higher diffusion of the virus. More precisely, for an increase of 1% in the population using the Internet, there are an estimated 1.29 more daily cases per 100,000 inhabitants. This number seems significant given the exponential nature of the pandemic, which could easily and swiftly lead to a worrying increase in the number of cases if left unattended. This number shrinks to 1.08 in the Western subsample, suggesting that this set of countries is slightly less sensitive to the variable than the rest of the sample. All these findings suggest that a positive answer can be given to RQ1.

An important limit of this empirical approach is that it does not take into account a direct measure of vulnerability of the population to misinformation. To try to overcome this problem, we amended Eq. (1), operationalizing the matrix Internet so that it also includes a measure of trust in science as an interaction term with the share of Internet users. It has been suggested that trust in science affects misinformation about COVID-19 (Bertin et al., 2020; Agley and Xiao 2021). Hence, using this approach it is possible to study the interaction between a measure of vulnerability to misinformation (i.e. trust in science) and an indirect measure of the popularity of the main channel we assumed is used for its spread (level of Internet diffusion in the country). It seems plausible that this empirical approach can be used to distinguish the effects of the two variables, considering both the trust of citizens in science and their exposure to a principal misinformation channel.

This new variable, gathered from Our World In Data, measures the share of respondents who answered “a lot” or “some” to the question: “How much do you trust science?” Unfortunately, these data are not available for 11 countries that are included in the main sample (namely, Angola, Azerbaijan, Belarus, Gambia, Iceland, Libya, Malawi, Pakistan, Rwanda, Singapore, and Sudan), which are hence excluded from this robustness test. The results, presented in Fig. 2, are consistent with the idea that, considering the two variables jointly, the level of trust in science has no effect on new cases, and that on the contrary the relationship is driven by the level of diffusion of the Internet. In other words, while trust in science is very important for informed decision-making, public health, and global well-being (Bertin et al., 2020; Agley and Xiao 2021), we may conclude that the Internet has changed information dynamics, and continues to be a cornerstone in the fight against misinformation when considering the former dimension too. Another problem may be that, as highlighted in the WHO definition of ‘infodemic’, reported above, the huge quantity of information available once one has access to the Internet makes it hard to process this, even for citizens who trust in science.

Fig. 2
figure 2

Impact of Internet users on New COVID-19 cases at different levels of Trust in Science. Source Author’s elaboration from data indicated in the article

To answer RQ2, and look for any effect from the interaction of the number of Internet users and the degree of freedom on the Web, as discussed above we once again slightly amended Eq. (1), operationalizing the variable Internet there as a matrix composed of Internet User, Freedom of the Net and their interaction. We computed the marginal effects of this interaction model, which are presented in Table 3, and also graphically in Fig. 3, concerning the interaction of the two variables of interest (only for the global sample, since unfortunately reduced size and variance made it impossible to estimate the marginal effects of the interaction in the subsample).

Table 3 F-GLS hybrid model-marginal effects-complete and western subsample
Fig. 3
figure 3

Impact of Internet users on New COVID-19 cases at different levels of Freedom on the Net. Source Author’s elaboration from data indicated in the article

This analysis also confirms the positive effect on the spread of the virus caused by the number of Internet users, in both the samples. However, when looking at the two variables at once, and at their interaction, the model confirms that there is no effect in the global sample, while it suggests a reduction of new cases due to an increase in Freedom of the net in the Western subsample, as presented in Fig. 3. This provides a first answer to RQ2, suggesting that an interaction effect is indeed in place, and that when looking at the intertwined effect of the two variables an increase in Internet freedom reduces the positive impact of more Internet users on the daily count of COVID-19 cases. Hence, it can be derived that more Internet users increase the diffusion of the virus, but this increase diminishes the freer the Internet is. The model predicts that on average, in the global sample, a score of around 65 in the Freedom of the net variable makes the increase in COVID-19 cases due to more Internet users statistically non-significant.

5 Discussion and Conclusions

The intricate web of interdependencies between Internet connectivity, pandemic dynamics, and societal responses has been explored in this study. It aimed to unravel the intricate relationship between Internet access, freedom of the Web, and the spread of COVID-19 during the first wave of the pandemic, a phase characterized by the prevalence of NPIs as primary mechanisms for containment of the epidemic. The methodological foundation of this study was rooted in the study of a balanced panel dataset, comprising 60 diverse countries, coupled with empirical techniques that accounted for temporal, geographical, and contextual factors. The decision to focus on the first wave was deliberate, acknowledging the contribution toward an unbiased estimation it provides, devoid of the complicating influence of vaccine availability in subsequent waves.

The empirical findings yielded insights that illuminate the various dimensions of this multifaceted nexus. The impact of Internet users on the spread of COVID-19 emerged, revealing a positive association between a higher share of the population using the Internet and an increase in new daily COVID-19 cases. This underscores the role of the Internet in shaping pandemic dynamics, albeit in a manner that accentuates the complexities of its implications. Indeed, our findings are not compatible with the idea that widespread Internet use among the population made stay-at-home orders and lockdowns more efficient, but on the contrary suggest a correlation between Internet diffusion and COVID-19 cases. The magnitude of this relationship is smaller within a subset of Western countries that are possibly more culturally homogenous, as well as providers of more reliable data, shedding light on the subtleties of the interplay between digitalization and pandemic responses.

This dynamic does not seem to be affected by the level of trust in science. While one might suppose that populations in countries where trust in science is more widespread are less affected by misinformation spread over the Internet, our results point in another direction, suggesting that no one is immune to the consequences of an infodemic. If we look at the consequences of the infamous suggestion made by Donald Trump during a White House briefing on 23 April 2020, in which he said that injecting bleach into one’s body was an effective treatment for COVID-19, one may actually consider it reasonable that trust in science is a dimension that can be crowded out by so-called “information bombing”.

In parallel, this study also examined the intricate interconnection between Internet freedom and COVID-19 trends, through the share of the population using the Internet. Our results suggested that when looking at the interaction between the two variables, an increase in the degree of freedom on the Web is correlated to a reduction in the increase of new COVID-19 cases due to an increase in Internet users. This finding is compatible with the idea that a freer Internet creates more informed citizens, who behave better in response to the pandemic, and at the same time rejects the idea that censoring the Internet contributes to the halting of harmful fake news. Our results indicate that reducing Internet freedom does not reduce the spread of COVID-19, and in fact that in countries where the Internet is freer, a higher share of Internet users in the population does not correspond to a higher diffusion of COVID-19.

In any case, the intricacies of the relationship between Internet freedom and COVID-19 dynamics likewise intertwine with the multifaceted landscape that lies in the background. The prevalence of misinformation on social media, as emphasized by Bridgman et al. (2020) and Lee et al. (2020), may not be exacerbated by greater digital freedom, contrary to what might be supposed. This could simply be due to the fact that people adapt to this “Far West” context, and hence are more skeptical, and less susceptible to fake news, or even to the fact that government control of the Internet is quite ineffective in a world where virtual private networks are commonly used and very cheap. At any rate, the dynamic highlighted by our analysis, where a higher degree of Internet freedom translates into a lower increase in daily COVID-19 cases due to the number of Internet users, cautions against government censorship of the Web, and underscores the need for better thought-out strategic policy interventions.

If we look at previous studies, it can be derived from the present analysis that there is no support in our empirics for the arguments of Barna (2020) and Feldmann et al. (2021), who suggest that greater Internet access can facilitate work, learning, and socialization from home, thus enabling compliance with NPIs. Actually, our results go in quite the opposite direction, i.e. a greater share of Internet users in the population leads to a higher number of daily COVID-19 cases. At the same time, our results resonate with the findings of Bridgman et al. (2020), Cuan-Baltazar et al. (2020), Himelein-Wachowiak et al. (2021), Lee et al. (2020), and Tasnim et al. (2020), who suggest that social media may have fueled the spread of COVID-19 misinformation, and in this way hindered the efficacy of NPIs.

As regards Internet freedom, the results of the interaction model suggest this correlation: the greater the degree of freedom on the Web, the lower the impact of more Internet users on the spread of COVID-19 will be. The nuanced relationship with COVID-19 dynamics underscores the lack of necessity for government frameworks when it comes to regulating the Internet, something very different to what has happened in recent years as a response to the pandemic (Shahbaz et al., 2020 and 2021). Governments have, on average and in general, reacted to the pandemic by restraining Internet freedom (Shahbaz et al., 2020 and 2021), but our findings indicate that restricting digital liberties did not safeguard public health, and in fact quite the contrary emerges according to our analysis. This may be due to a framework in which Internet users, aware of the restrictions imposed on the Web, lost trust in what they learned online, which would include government health prescriptions. This is compatible with a scenario in which citizens did not follow NPIs and other public health suggestions, and it led in the end to a higher number of cases. These findings call for informed policy interventions, which consider not only the immediate effect these policies aim to reach in the technological landscape, but also include a more careful and thoughtful analysis of the sociopolitical context within which they unfold, which tries to estimate the true, final effect of such policies, including several determinants.

In the grand tapestry of the pandemic era, this study contributes a carefully woven thread that enhances our comprehension of the manifold relationships between technology, information dissemination, and the spread of disease. Our results underline the multidimensionality of the influence of the Internet, as a vector of both empowerment and vulnerability. As societies grapple with the realities of information proliferation, misinformation, and digital freedom, a concerted approach that combines empirical rigor, policy acumen, and technological stewardship emerges as the way forward.

However, it is crucial to acknowledge the limitations inherent in the present study. Firstly, our analysis relies on macro-level data, which aggregates information at the country level. This approach may lead to the ecological fallacy, where conclusions drawn at the country level may not accurately reflect individual-level behaviors or outcomes. While the results provide insights into the relationship between Internet access, Internet freedom, and the spread of COVID-19, caution must be exercised when generalizing these findings to individual behaviors or local contexts.

Moreover, we proxied the level of misinformation within the countries studied in this analysis with the levels of Internet access, Internet freedom, and their interaction. This is of course a questionable assumption, which does not actually measure the (very difficult to quantify) level of misinformation, but only the popularity of its main habitat. Hence, while we tried to smooth this angle in a robustness check that includes a measure of trust in science, it is important to highlight that this assumption could lead to wrong results if it turns out to be unfounded.

Also, data that we found on Internet use are unfortunately not divided by the types of this use. This implies that our notion of Internet diffusion cannot discriminate what use people make of the Internet. Unfortunately, to the best of our knowledge, there are no better data available for such a big panel that could be used alternatively, so this opportunity remains open for future studies. Moreover, we assumed that the freedom of the Internet mitigates the adverse impact of Internet access on COVID-19 diffusion since a freer Internet creates more informed citizens. This mechanism has not been tested empirically due to lack of data, and while we are encouraged by previous studies that suggest that a freer Internet can contribute to creating more informed citizens (Klingner, 2019), and that it provides easier access to useful information and health-related resources, empowering individuals to make informed decisions about their health (Santana et al. 2011), it is important to recognize this limit in our analysis.

The study’s conclusions are limited to the observed patterns at the country level and do not account for potential variations within each country’s population. As with all cross-country studies, our estimations are an average effect in very different countries, rather than precise estimations on a specific sample. Even if this should help increase the external validity of our results, this inevitably leads to less precise estimations, as shown clearly by the different coefficients in specifications on the complete sample and the Western subsample.

Furthermore, the use of a panel data analysis, while valuable for examining temporal trends, has its limitations. The study’s results are contingent upon the availability and accuracy of the data sources that are utilized. Variations in data reporting, testing protocols, and case identification across countries may introduce biases to the analysis. Additionally, the study focuses on the initial phase of the pandemic, when NPIs were a primary means of curbing transmission. The dynamics of the spread of COVID-19 may have evolved over subsequent waves due to changes in testing strategies, levels of public awareness, and vaccine distribution. Moreover, the choice to focus on the first wave, while theoretically sound, restrains the generalizability of findings to subsequent waves characterized by vaccination efforts and evolving preparedness. Additionally, the study’s scope is limited to the early phase of the pandemic, precluding insights into the impacts of evolving policies and public responses. The reliance on reported COVID-19 data, which may be subject to varying levels of accuracy and consistency across nations, introduces potential measurement errors that may influence the results. Finally, the sample is derived from the availability of the data, rather than the result of a sampling operation, and while we believe that our sample is composed of a set of very heterogeneous countries, this may introduce biases into the analysis.

The synthesis of empirical insights and theoretical considerations precipitates a host of implications and future directions. The observed positive association between Internet users and the spread of COVID-19 prompts reflections on strategies for responsible information dissemination. Encouraging media literacy, fact-checking mechanisms, and proactive measures to counteract misinformation gains heightened significance, as the study hints at the potential for digital connectivity to amplify the viral trajectory, as well as at the detrimental effects of restricting freedom on the Internet, suggesting that there are no alternatives to critically conscious citizens, and that censorship is not a viable shortcut.

The interplay between Internet users and Internet freedom, as shown by their interaction effects, invites systematic exploration. Future research could delve deeper into the mechanisms that underpin this intricate interdependence. Societal attitudes, information consumption patterns, and digital literacy may collectively shape the contours of this relationship, warranting interdisciplinary inquiry that spans technology, sociology, and policy studies. Moreover, a heterogeneity analysis of the type of media consumption might strengthen the literature, also exploiting the distinction between social and traditional media, and the information versus disinformation channel.

In conclusion, this study enriches our comprehension of the nexus between digital connectivity and pandemic dynamics. As societies navigate the digital frontier in an age of pandemics, the synthesis of empirical evidence and theoretical underpinnings guides us toward a more informed, resilient, and proactive approach to technology-mediated challenges.