As of late April 2020, there were more than three million confirmed cases worldwide of the coronavirus disease COVID-19 (CSSE, 2020) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). While enquiries are still underway regarding the exact origins of the virus, the first case in humans was identified in late 2019 in Wuhan, China. Subsequent person-to-person transmission then mainly took place through respiratory droplets produced through coughs or sneezes of infected persons in close presence of other people (Huang et al. 2020; Peeri et al. 2020).

One well-established cause of the global spread of previous influenza outbreaks is airplane travel (Grais et al. 2003) and without this mode of transportation the coronavirus would not have arrived in Europe so quickly. The first cases were confirmed in Europe in late January-early February and, although international travel restrictions to and from China were quickly imposed, by late March about half of the world’s reported cases were in Europe (CSSE, 2020). Once the virus was brought to Europe, human interaction and close contact allowed the virus to spread quickly. Previous research on virus transmission, including on COVID-19, has shown social contact to be very important (Bayer & Kuhn, 2020; Bi et al. 2020; Liu et al. 2020; Mossong et al. 2008; Wallinga et al. 2006). Based on modelled estimates using both empirical (the POLYMOD survey) and synthetic data, Prem, van Zandvoort, et al. (2020) showed how altering (intergenerational) patterns of contact by reducing physical contact or shielding would lead to large reductions in the transmission of the virus. At the same time, cultural differences in Europe are well established. For instance, in Southern European countries intergenerational contact is more frequent than in the less family-oriented Western and Northern European countries, as social norms about providing support to family members and maintaining interpersonal familiar interactions are stronger there (Reher, 1998; Sánchez Rodríguez et al. 2014). In relation to COVID-19 case fatality rates, Arpino et al. (2020) recently showed it to be broadly positively associated with intergenerational co-residence and contacts at the national level across a selection of European countries. However, as conclusive interpretations could not be derived because the association did not hold at the province level in the case of Italy, the authors advocated considering confounding factors when analysing the effect of intergenerational relations.

Notwithstanding, as patterns in social mixing are embedded in socioeconomic and cultural factors and are very different across Europe, this continent remains an excellent geographical area for study of COVID-19 proliferation. The present study therefore analyses statistical associations between different indicators of social and economic ties and the reported number of confirmed cases of COVID-19 in 23 European countries between March 1 and April 30, 2020.

Data and method

Data on the number of confirmed cases of COVID-19 come from the Center for Systems Science and Engineering at Johns Hopkins University (CSSE, 2020). Data on the covariates come from different sources and relate to years as close to 2020 as was possible to obtain (see notes under Table 1). Data from the European Social Survey, Eurostat, the World Bank, and the OECD were used to approximate different types of social and cultural ties, which we hypothesize to be positively associated with COVID-19: average number of household members; percentage living in a multi-generational household; proportion of people who have frequent social meetings with friends, relatives, or colleagues; and religious attendance. In addition, we also test the effect of the socioeconomic variables tertiary education and GDP per capita, as we assume that higher educated or more economically developed countries are more likely to pursue activities that require travelling (e.g., international business meetings, skiing), factors which contributed to the initial outbreak of the epidemic. Lastly, we test the effect of demographic variables: the share of the population aged 65 + , population density and per capita number of beds in nursing and residential care facilities (all expected to be positively associated with COVID-19) (Table 1).

Table 1 Descriptive statistics of the number of reported confirmed COVID-19 cases and the covariates used in the analysis

Bivariate associations between the covariates and the natural logarithm (ln) of the cumulative number of confirmed COVID-19 cases between March 1 and April 30 are analysed at 10-day intervals to ascertain whether the direction and strength of the associations changed over time. A later date was not analysed due to country-differences in the severity and timing of movement and social contact restrictions that European governments implemented during this period, thus confounding the effect of the tested social and economic tie variables. In the supplementary material file the analyses are repeated for COVID-19 cases per 100,000 population and the number of cases during each 10-day period.

Ordinary linear regression analysis was used to assess the unique association between confirmed cases of COVID-19 and the covariates. However, as covariate data could only be obtained for 23 countries, i.e., too few to test all covariates simultaneously without overfitting (Harrell Jr et al. 1984; Peduzzi et al. 1996), we first opted to reduce the number of variables by performing a factor analysis. This method not only simplifies the subsequent analysis, it also alerts us to groupings of variables that we would not otherwise have thought of, enabling us to work at a more sophisticated conceptual level (De Vaus, 2002).

The factor analysis yielded three sociodemographic-like latent factors that explained 78% of the country-variation in the selected covariates. Each latent factor is highly associated (> 0.75) with one or more covariates, and this is why Factor 1 has been labelled “socially and economically vibrant”, Factor 2 “relatively young population”, and Factor 3 “densely populated and traditional” (see Tables S1-S4 in the Supplementary file). Correlation coefficients between COVID-19 and the obtained latent factors were first calculated (Table 2) before performing the multivariate regression analysesFootnote 1 to obtain the adjusted R2 of the models (Table 3) and the unstandardized coefficients, i.e., the factors’ slope (Table S5).

Table 2 Correlation between the natural log COVID-19 in 23 European countries at six different time periods and the covariates
Table 3 Correlation between the natural log of COVID-19 in 23 European countries at six different time periods and the extracted factors


Figure 1 and Table 2 present the association between the number of confirmed cases of COVID-19 and the different covariates for the six dates between March 1 and April 30, 2020. The highest (and significant) associations are observed for the social meeting and population density variables with the latter association becoming stronger over time. Figure 2 and Table 3 present the association between the three extracted factors and COVID-19. The “socially and economically vibrant” factor has a strong and positive association throughout the study period. The “relatively young population” factor is negatively associated (as expected) with COVID-19 but its p-values are insignificant. On the other hand, the association of the “densely populated and traditional” factor was initially low but increased with time, becoming the most important factor by the end of March. The three factors together explain close to 50% of the cross-country variation in the number of confirmed cases of COVID-19 between March 11 and April 30, compared to just 21% on March 1 (Table 4). The slope of the “socially and economically vibrant” factor was greatest on March 11 and that of the “densely populated and traditional factor” on April 20.

Fig. 1
figure 1figure 1

Association between covariates and natural log of cumulative cases of COVID-19, March 1, 31, April 30, 2020 among 23 European countries

Fig. 2
figure 2

Association between factor scores and natural log of cumulative cases of COVID-19. March 1, 31, and April 30, 2020 among 23 European countries

Table 4 Multivariate regression analysis of social and demographic factors on cumulative cases of the ln of COVID-19 in 23 European countries on March 1, 11, 21 and 31, and April 10, 20, and 30, 2020. Unstandardized coefficients (p-value)

If we analyse the change in COVID-19 cases over 10-day periods rather than the absolute number of cumulative cases, results are virtually the same (Supplementary Table S5a). The “socially and economically vibrant” factor is strongly significant during March and the first 10 days of April, while the “densely populated and traditional factor” was significant throughout the entire study period and became the most important explanatory factor from March 21. The “relatively young population” factor showed little association in any of the models. The proportion of the country differences in change in COVID-19 explained by the three factors equalled 44–48%, dropping down to 33% during the last 10 days of April). Conversely, the factors explain much less of the country differences in the number of cases of COVID-19 per 100,000 population. The correlation with the “socially and economically vibrant” factor is above 0.4 from March 31 onwards and the same applies to the densely populated and traditional factor 10 days later (which again becomes the most important explanatory factor) (Supplementary Table S6). The explanatory power increased steadily over time from 3% (March 1) to 43% (April 30). This is consistent with the fact that towards the end of April, the countries with a high number of cases per 100,000 population included not only Italy, but also the densely populated Belgium and the Netherlands, while COVID-19 rates were (still) quite low in the sparsely populated Scandinavian and Baltic countries (Supplementary Figure S4).


Confirmed cases of COVID-19 increased sharply across Europe during March and April of 2020. Throughout most of the studied period, Italy was worst hit by the pandemic in absolute numbers, but Spain surpassed Italy in early April, while Belgium and Ireland did so in terms of cases per 100,000 people. The question we posed is whether social, economic, and demographic factors could explain the observed differences in Europe.

Our results suggest that it is not so much how aged countries are but their (historical) level of economic development and (associated) social ties that may have led to the initial spread of the COVID-19 pandemic. While these factors continued to be important throughout the analysed period, population density and cultural factors also contributed to the subsequent diffusion of the virus.

Considering specific examples, the Netherlands, Switzerland, and Sweden all scored high on the “socially and economically vibrant” factor and saw their number of coronavirus infections quickly increase during March despite households being almost exclusively single-person or nuclear. An important component of this factor, however, is also the number of available beds in nursing and residential care facilities, in which all three countries score high. Recent studies have shown that nursing homes may be responsible for 19% to 72% of COVID-19 deaths (Comas-Herrera et al. 2020; Orange, 2020). On the other hand, Italy, which was the initial epicentre of the pandemic in Europe, scored very low on the “relatively young population” factor, but high on the “densely populated and traditional” factor. In other words, its aged population, high population density and traditional values (approximated through the proportion weekly church attendants) is likely to have contributed to their high rate of diagnosed cases. Moreover, the relative position of other traditionally catholic countries in COVID-19, including Spain, Portugal and Belgium worsened markedly between March 21 and April 10 (Fig. 2).

While the country differences in COVID-19 cases per 100,000 people across the 23 European countries could only be weakly explained by the three factors in early March, by mid-April, both the “socially and economically vibrant” and “densely populated and traditional” factors contributed significantly to the explanation of the European country differences, as the number of cases increased markedly in the most socioeconomically developed and densely populated European countries. More specifically, by analysing changes in COVID-19 cases over 10-day periods, we found that during the early stage of the epidemic (early March) social and economic ties appeared to be most important, while population density and church attendance explained more of the growth in diagnosed cases from late March until late April (the end of the analysed study period). In light of research from elsewhere, it is noteworthy to mention that the role of religious services in the spread of the coronavirus in South Korea is well-documented through field investigations that established the source of infection for many cases (Prem, Liu, et al., 2020). Likewise, regarding the initial spread in Europe, Bartscher et al. (2020) also found the coronavirus to be initially more prevalent in high social capital areas. On the other hand, the effect of a relatively young, highly educated population (Factor 2) was not associated with the number of registered COVID-19 infections. We know that during the first wave of the pandemic few asymptomatic and/or young people were tested (Kohns Vasconcelos et al. 2021; Surkova et al. 2020), so this could explain why the association was not positive. Conversely, as results also showed that multigenerational households was the most important variable in the “socially and economically vibrant” factor and highly positive with COVID-19, it suggests that it is not the proportion of elderly per se that leads to higher rates of COVID-19, but the level of social interaction (as well as the number people in residential care homes). In this context and supported by evidence from studies that analysed intergenerational co-residence (Esteve et al. 2020) and contact patterns (Prem, van Zandvoort, et al., 2020), tailored public health responses, in particular shielding policy for the elderly, are recommended during the early stage of corona-type of virus epidemics to not only reduce their impact on the health of individuals and public health care systems, but also on the economy (see also Prem, Liu, et al., 2020; Prem, van Zandvoort, et al., 2020; Davies et al. 2020).

Some limitations of our study should be mentioned. First, we did not consider country differences in (the timing of) government (and individual) responses to the COVID-19 pandemic. Governments have differed in the timing of the implementation of measures such as cancelling public events, closing day care centres, schools and universities, social distancing, or partial or total lockdowns (Flaxman et al. 2020). This implies that the effect of social and demographic factors on COVID-19 cases may be confounded by these measures in those countries that were quickest at adopting them and had already past their peak of daily additional cases of COVID-19 (e.g., Italy). Given the estimated average latency period between becoming infected by the coronavirus and reported COVID-19, we think that only the last two data points may be affected by this.

Other factors are also likely to be responsible for the spread of the coronavirus in Europe. A French and Austrian ski resort was responsible for initial infections in the UK and other Northern European countries (Flaxman et al. 2020; Hruby, 2020), but tertiary education and GDP variables are likely to capture the influence of winter holidays or international travel on country differences in COVID-19 during the studied period. Smoking is another variable associated with the proliferation of COVID-19 due to its social function (Paul et al. 2010) and because smokers are more likely to touch their face and mouth and have chronic health conditions (GBD, 2015 Tobacco Collaborators 2017; Science Media Centre, 2020). However, we think that smoking is a more important factor to consider in individual or small-area studies, as analysis showed smoking rates to be higher in (mainly Eastern European) countries where the coronavirus was late in getting a hold.

Another issue of concern is country differences in testing for COVID-19. Some countries only test people admitted to hospitals or ramped up the testing program much later during the first outbreak than other countries. This implies that particularly some of the earlier data points will be an underestimate of the real prevalence of COVID-19 as it mainly pertains to symptomatic people (Farge & Revill, 2020; Kohns Vasconcelos et al. 2021; Wikipedia, 2020). Apart from unknown symptomacy, the data did not contain information on the place where the coronavirus was contracted. Such data has only been used in (family) case cluster studies (e.g. Chan et al. 2020; Danis et al. 2020; Fong et al. 2020). In the context of our study, of particular interest would have been being able to distinguish between the proportion of infections that took place outside the home (more likely among children and people employed) and those within the household (more probable among older people and those living in multigenerational or overcrowded households). As we analysed different periods of the first wave, such information could have provided us with more insight into the importance of particular variables for the different settings of infection.

A recurrent problem of any national-level analysis that uses aggregate data is that any association found might not necessarily reflect associations that are observed at the individual level, a shortcoming known as the “ecological fallacy”. That said, we did not obtain results contradictory to what we expected.

Finally, data from the ESS is not available for all European countries, implying that different results may be obtained if data for other countries become available.

To conclude, the main take away message for public health policy is that, while disentangling the effect of a variegated number of social, cultural, economic, and demographic factors on the diffusion of the COVID-19 epidemic is a difficult task, the level of importance of specific determinants in spreading the virus is likely to change over time. In a European setting we found that factors associated with the level of economic development and social ties were particularly important initially, while population density and cultural factors were likely to have facilitated further spread of the virus once it took hold. However, as Chinazzi et al. (2020) showed, the implementation of international travel restrictions would not be enough to curb the initial spread of a virus, as disease transmissibility also needs to be reduced through public health interventions and behavioural changes. Examples of the efficacy of border management policies and stringent public health interventions are Taiwan and New Zealand. Although New Zealand was helped by its isolation, Taiwan managed even better to limit the number of reported infections, despite its close proximity to the Chinese mainland, through its existing disease and outbreak surveillance systems and effective means of face mask distribution and promotion. Both countries also had strict border management policies, quarantining rules, and secure facilities for incoming travellers in place and developed contact tracing (Summers et al. 2020). Conversely, European island nations, particularly Ireland and Iceland, were clearly much less successful in dealing with the first wave of the pandemic as infection rates were one of the highest in the world (CSSE, 2020). Based on our results we therefore recommend for future outbreaks of coronavirus-like epidemics when no vaccine is yet available, quick implementation of travel restrictions and very strict measures of social distancing. This should be especially done in densely populated countries with strong international economic and social ties, in order to minimise the proliferation of cases during the secondary transmission phase that occurs within households and nursing homes. A recommendation for future research is to perform a European analysis at the sub-national level, given the unequal distribution of COVID-19 within countries (e.g., the north vs. the south of Italy).